[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1404 @Leemoonsoo Finally got the unit test passed (The remaining one failure is irrelevant). Actually the test failure is caused by several bugs of `SparkInterpreter`. Here's the steps to reproduce the first issue. 1. Use Spark2.0 and make SparkInterpeter as scoped 2. Open note1 and run sample code to start `SparkInterperer`, then Open note2 and run sample code to start another `SparkInterpreter`, then you will hit the following issue. ![image](https://cloud.githubusercontent.com/assets/164491/18614389/4887684c-7dc0-11e6-9898-18fa8274be6d.png) The root cause of this issue is that the `outputDir` should be unique otherwise the second `SparkInterpreter` instance can not find the class in the `outputDir` of previous `SparkInterpeter`. The second bug is that we should also set `sparkSession` as null. Otherwise it won't be created in the next second `SparkInterperter`. The third bug is that we should disable `HiveContext` in `AbstractTestRestApi`, otherwise we will hit the issue of multiple derby instances running. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1404 @Leemoonsoo The test is passed in my local box both under spark 1.6.2 and spark 2.0. Not sure why the travis CI fails, Is it possible for us to access the --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1404 Thanks @zjffdu. I think second ci test profile failure looks relevant. ``` ests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 120.231 sec <<< FAILURE! - in org.apache.zeppelin.rest.ZeppelinSparkClusterTest pySparkTest(org.apache.zeppelin.rest.ZeppelinSparkClusterTest) Time elapsed: 3.842 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.zeppelin.rest.ZeppelinSparkClusterTest.pySparkTest(ZeppelinSparkClusterTest.java:150) ``` Could you check? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1404 @Leemoonsoo , I updated the unit test. And also made a little change `AbstractTestRestApi`, the standalone way doesn't work for me. So I allow user to export `SPAKP_MASTER` to run it in other modes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1404 @zjffdu Right, it looks like AbstractTestRestApi need to be improved when CI is not defined. So far, i think you can try download and run spark standalone cluster in this way ``` ./testing/downloadSpark.sh 1.6.2 2.6 ./testing/startSparkCluster.sh 1.6.2 2.6 ``` And then try run the test cases, so `getSparkHome()` can find sparkHome. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1404 @Leemoonsoo , I follow the above command, but seems it doesn't work. I check `AbstractTestRestApi`, It seems pyspark related job would only run either in travis CI or set in spark standalone with SPARK_HOME is setup (`pyspark` needs to be set as true). Do I understand correctly ? ``` // ci environment runs spark cluster for testing // so configure zeppelin use spark cluster if ("true".equals(System.getenv("CI"))) { // assume first one is spark InterpreterSetting sparkIntpSetting = null; for(InterpreterSetting intpSetting : ZeppelinServer.notebook.getInterpreterFactory().get()) { if (intpSetting.getName().equals("spark")) { sparkIntpSetting = intpSetting; } } // set spark master and other properties sparkIntpSetting.getProperties().setProperty("master", "spark://" + getHostname() + ":7071"); sparkIntpSetting.getProperties().setProperty("spark.cores.max", "2"); // set spark home for pyspark sparkIntpSetting.getProperties().setProperty("spark.home", getSparkHome()); pySpark = true; sparkR = true; ZeppelinServer.notebook.getInterpreterFactory().restart(sparkIntpSetting.getId()); } else { // assume first one is spark InterpreterSetting sparkIntpSetting = null; for(InterpreterSetting intpSetting : ZeppelinServer.notebook.getInterpreterFactory().get()) { if (intpSetting.getName().equals("spark")) { sparkIntpSetting = intpSetting; } } String sparkHome = getSparkHome(); if (sparkHome != null) { sparkIntpSetting.getProperties().setProperty("master", "spark://" + getHostname() + ":7071"); sparkIntpSetting.getProperties().setProperty("spark.cores.max", "2"); // set spark home for pyspark sparkIntpSetting.getProperties().setProperty("spark.home", sparkHome); pySpark = true; sparkR = true; } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1404 @zjffdu Once you build zeppelin, ``` mvn package [your profiles] -DskipTests ``` Then you can run this test, like ``` mvn package -pl 'zeppelin-interpreter,zeppelin-zengine,zeppelin-server' -Dtest=ZeppelinSparkClusterTest -DfailIfNoTests=false -Drat.skip=true ``` Let me know if it does not work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1404 I tested this branch with given example, but it doesn't work for me. On my machine, it hangs on `sqlContext.createDataFrame()` and end up with errors like ``` kqueue: Too many open files in system ``` I'm not sure it's problem of this patch or not. Could someone else test this patch, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1404 Thanks @zjffdu for the contribution. Actually, we do have some tests for pyspark already. Please see https://github.com/apache/zeppelin/blob/master/zeppelin-server/src/test/java/org/apache/zeppelin/rest/ZeppelinSparkClusterTest.java#L125 If it's not too much difficult, adding unit test for this case would be really beneficial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user felixcheung commented on the issue: https://github.com/apache/zeppelin/pull/1404 could you kick off CI again? Let's merge this after --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user felixcheung commented on the issue: https://github.com/apache/zeppelin/pull/1404 CI failed because of selenium, I think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user felixcheung commented on the issue: https://github.com/apache/zeppelin/pull/1404 right. probably another PR, but I think we could use travis' addons support to install python via apt-get https://docs.travis-ci.com/user/installing-dependencies/ as we have for R. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1404 It guess it is because PythonInterpreter depends on python environment, so there's no test for it yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user felixcheung commented on the issue: https://github.com/apache/zeppelin/pull/1404 LGTM It'll be great to add some test for pyspark interpreter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1404: ZEPPELIN-1411. UDF with pyspark not working - object h...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1404 \cc @Leemoonsoo Please help review, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---