[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-18 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
@Leemoonsoo Finally got the unit test passed (The remaining one failure is 
irrelevant).

Actually the test failure is caused by several bugs of `SparkInterpreter`.  
Here's the steps to reproduce the first issue.
1. Use Spark2.0 and make SparkInterpeter as scoped
2. Open note1 and run sample code  to start `SparkInterperer`, then Open 
note2 and run sample code to start another `SparkInterpreter`, then you will 
hit the following issue.
 

![image](https://cloud.githubusercontent.com/assets/164491/18614389/4887684c-7dc0-11e6-9898-18fa8274be6d.png)

The root cause of this issue is that the `outputDir` should be unique 
otherwise the second `SparkInterpreter` instance can not find the class in the 
`outputDir` of previous `SparkInterpeter`.

The second bug is that we should also set `sparkSession` as null. Otherwise 
it won't be created in the next second `SparkInterperter`. 

The  third bug is that we should disable `HiveContext` in 
`AbstractTestRestApi`, otherwise we will hit the issue of multiple derby 
instances running.   



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-14 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
@Leemoonsoo The test is passed in my local box both under spark 1.6.2 and 
spark 2.0.  Not sure why the travis CI fails, Is it possible for us to access 
the 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-13 Thread Leemoonsoo
Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
Thanks @zjffdu. I think second ci test profile failure looks relevant.

```
ests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 120.231 sec 
<<< FAILURE! - in org.apache.zeppelin.rest.ZeppelinSparkClusterTest
pySparkTest(org.apache.zeppelin.rest.ZeppelinSparkClusterTest)  Time 
elapsed: 3.842 sec  <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.zeppelin.rest.ZeppelinSparkClusterTest.pySparkTest(ZeppelinSparkClusterTest.java:150)
```

Could you check? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-13 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
@Leemoonsoo , I updated the unit test. And also made a little change 
`AbstractTestRestApi`, the standalone way doesn't work for me. So I allow user 
to export `SPAKP_MASTER` to run it in other modes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-12 Thread Leemoonsoo
Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
@zjffdu Right, it looks like AbstractTestRestApi need to be improved when 
CI is not defined.
So far, i think you can try download and run spark standalone cluster in 
this way

```
./testing/downloadSpark.sh 1.6.2 2.6
./testing/startSparkCluster.sh 1.6.2 2.6
```

And then try run the test cases, so `getSparkHome()` can find sparkHome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-12 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
@Leemoonsoo , I follow the above command, but seems it doesn't work. I 
check `AbstractTestRestApi`, It seems pyspark related job would only run either 
in travis CI or set in spark standalone with SPARK_HOME is setup (`pyspark` 
needs to be set as true).  Do I understand correctly ?
```
 // ci environment runs spark cluster for testing
  // so configure zeppelin use spark cluster
  if ("true".equals(System.getenv("CI"))) {
// assume first one is spark
InterpreterSetting sparkIntpSetting = null;
for(InterpreterSetting intpSetting : 
ZeppelinServer.notebook.getInterpreterFactory().get()) {
  if (intpSetting.getName().equals("spark")) {
sparkIntpSetting = intpSetting;
  }
}

// set spark master and other properties
sparkIntpSetting.getProperties().setProperty("master", "spark://" + 
getHostname() + ":7071");
sparkIntpSetting.getProperties().setProperty("spark.cores.max", 
"2");

// set spark home for pyspark
sparkIntpSetting.getProperties().setProperty("spark.home", 
getSparkHome());
pySpark = true;
sparkR = true;

ZeppelinServer.notebook.getInterpreterFactory().restart(sparkIntpSetting.getId());
  } else {
// assume first one is spark
InterpreterSetting sparkIntpSetting = null;
for(InterpreterSetting intpSetting : 
ZeppelinServer.notebook.getInterpreterFactory().get()) {
  if (intpSetting.getName().equals("spark")) {
sparkIntpSetting = intpSetting;
  }
}

String sparkHome = getSparkHome();
if (sparkHome != null) {
  sparkIntpSetting.getProperties().setProperty("master", "spark://" 
+ getHostname() + ":7071");
  sparkIntpSetting.getProperties().setProperty("spark.cores.max", 
"2");
  // set spark home for pyspark
  sparkIntpSetting.getProperties().setProperty("spark.home", 
sparkHome);
  pySpark = true;
  sparkR = true;
}
```
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-09 Thread Leemoonsoo
Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
@zjffdu 

Once you build zeppelin,
```
mvn package [your profiles] -DskipTests
```

Then you can run this test, like
```
mvn package -pl 'zeppelin-interpreter,zeppelin-zengine,zeppelin-server' 
-Dtest=ZeppelinSparkClusterTest -DfailIfNoTests=false -Drat.skip=true
```

Let me know if it does not work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-08 Thread Leemoonsoo
Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
I tested this branch with given example, but it doesn't work for me.
On my machine, it hangs on `sqlContext.createDataFrame()` and end up with 
errors like

```
kqueue: Too many open files in system
```

I'm not sure it's problem of this patch or not.
Could someone else test this patch, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-08 Thread Leemoonsoo
Github user Leemoonsoo commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
Thanks @zjffdu for the contribution. Actually, we do have some tests for 
pyspark already.

Please see 
https://github.com/apache/zeppelin/blob/master/zeppelin-server/src/test/java/org/apache/zeppelin/rest/ZeppelinSparkClusterTest.java#L125

If it's not too much difficult, adding unit test for this case would be 
really beneficial.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
could you kick off CI again? Let's merge this after


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
CI failed because of selenium, I think



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-05 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
right. probably another PR, but I think we could use travis' addons support 
to install python via apt-get 
https://docs.travis-ci.com/user/installing-dependencies/
as we have for R.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-05 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
It guess it is because PythonInterpreter depends on python environment, so 
there's no test for it yet. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-05 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
LGTM
It'll be great to add some test for pyspark interpreter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...

2016-09-05 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/zeppelin/pull/1404
  
\cc @Leemoonsoo  Please help review, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---