Re: Jenkins build errors
The problem was with the changes upstream. fetch upstream and a rebase resolved it and now the build is passing. I also added a design doc and made the JIRA description a bit clearer (https://issues.apache.org/jira/browse/SPARK-24020) so I hope it will get merged soon. Thanks, Petar Sean Owen @ 1970-01-01 01:00 CET: > Also confused about this one as many builds succeed. One possible difference > is that this failure is in the Hive tests, so are you building and testing > with -Phive locally where it works? still does not explain the download > failure. It could be a mirror > problem, throttling, etc. But there again haven't spotted another failing > Hive test. > > On Wed, Jun 20, 2018 at 1:55 AM Petar Zecevic wrote: > > It's still dying. Back to this error (it used to be spark-2.2.0 before): > > java.io.IOException: Cannot run program "./bin/spark-submit" (in directory > "/tmp/test-spark/spark-2.1.2"): error=2, No such file or directory > So, a mirror is missing that Spark version... I don't understand why nobody > else has these errors and I get them every time without fail. > > Petar - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Jenkins build errors
It's still dying. Back to this error (it used to be spark-2.2.0 before): java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.1.2"): error=2, No such file or directory So, a mirror is missing that Spark version... I don't understand why nobody else has these errors and I get them every time without fail. Petar Le 6/19/2018 à 2:35 PM, Sean Owen a écrit : Those still appear to be env problems. I don't know why it is so persistent. Does it all pass locally? Retrigger tests again and see what happens. On Tue, Jun 19, 2018, 2:53 AM Petar Zecevic <mailto:petar.zece...@gmail.com>> wrote: Thanks, but unfortunately, it died again. Now at pyspark tests: Running PySpark tests Running PySpark tests. Output is in /home/jenkins/workspace/SparkPullRequestBuilder@2/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'python3.4', 'pypy'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Will skip PyArrow related features against Python executable 'python2.7' in 'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not found. Will skip Pandas related features against Python executable 'python2.7' in 'pyspark-sql' module. Pandas >= 0.19.2 is required; however, Pandas 0.16.0 was found. Will test PyArrow related features against Python executable 'python3.4' in 'pyspark-sql' module. Will test Pandas related features against Python executable 'python3.4' in 'pyspark-sql' module. Will skip PyArrow related features against Python executable 'pypy' in 'pyspark-sql' module. PyArrow >= 0.8.0 is required; however, PyArrow was not found. Will skip Pandas related features against Python executable 'pypy' in 'pyspark-sql' module. Pandas >= 0.19.2 is required; however, Pandas was not found. Starting test(python2.7): pyspark.mllib.tests Starting test(pypy): pyspark.sql.tests Starting test(pypy): pyspark.streaming.tests Starting test(pypy): pyspark.tests Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). ... [Stage 0:> (0 + 1) / 1] .. [Stage 0:> (0 + 4) / 4] . [Stage 0:> (0 + 4) / 4] .. [Stage 0:> (0 + 4) / 4] [Stage 0:> (0 + 4) / 4] [Stage 0:> (0 + 4) / 4] [Stage 0:> (0 + 4) / 4] [Stage 0:>(0 + 32) / 32]... [Stage 10:> (0 + 1) / 1] . [Stage 0:> (0 + 4) / 4] .s [Stage 0:> (0 + 1) / 1] . [Stage 0:> (0 + 4) / 4] [Stage 0:==>(1 + 3) / 4] . [Stage 0:>
Re: Jenkins build errors
local-ivy-cache: tried /home/jenkins/.ivy2/local/a/mylib/0.1/ivys/ivy.xml -- artifact a#mylib;0.1!mylib.jar: /home/jenkins/.ivy2/local/a/mylib/0.1/jars/mylib.jar central: tried https://repo1.maven.org/maven2/a/mylib/0.1/mylib-0.1.pom -- artifact a#mylib;0.1!mylib.jar: https://repo1.maven.org/maven2/a/mylib/0.1/mylib-0.1.jar spark-packages: tried http://dl.bintray.com/spark-packages/maven/a/mylib/0.1/mylib-0.1.pom -- artifact a#mylib;0.1!mylib.jar: http://dl.bintray.com/spark-packages/maven/a/mylib/0.1/mylib-0.1.jar repo-1: tried file:/tmp/tmpgO7AIY/a/mylib/0.1/mylib-0.1.pom :: :: UNRESOLVED DEPENDENCIES :: :: :: a#mylib;0.1: not found :: :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: a#mylib;0.1: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1268) at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:49) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:348) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:170) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Ffile:/tmp/tmpwtN2z_ added as a remote repository with the name: repo-1 Ivy Default Cache set to: /home/jenkins/.ivy2/cache The jars for the packages stored in: /home/jenkins/.ivy2/jars :: loading settings :: url = jar:file:/home/jenkins/workspace/SparkPullRequestBuilder@2/assembly/target/scala-2.11/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml a#mylib added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found a#mylib;0.1 in repo-1 :: resolution report :: resolve 1378ms :: artifacts dl 4ms :: modules in use: a#mylib;0.1 from repo-1 in [default] - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 1 | 1 | 0 || 1 | 0 | - :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 1 already retrieved (0kB/8ms) . [Stage 0:> (0 + 4) / 4] . [Stage 0:> (0 + 1) / 1] . [Stage 0:> (0 + 1) / 1] ... [Stage 0:> (0 + 4) / 20] [Stage 0:=>(6 + 4) / 20] .. == FAIL: test_package_dependency (pyspark.tests.SparkSubmitTests) Submit and test a script with a dependency on a Spark Package -- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/tests.py", line 2093, in test_package_dependency self.assertEqual(0, proc.returncode) AssertionError: 0 != 1 -- Ran 127 tests in 205.547s FAILED (failures=1, skipped=2) NOTE: Skipping SciPy tests as it does not seem to be installed NOTE: Skipping NumPy tests as it does not seem to be installed Random listing order was used Had test failures in pyspark.tests with pypy; see logs. [error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/python/run-tests --parallelism=4 ; received return code 255 Attempting to post to Github... > Post successful. Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab
Jenkins build errors
Hi, Jenkins build for my PR (https://github.com/apache/spark/pull/21109 ; https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92023/testReport/org.apache.spark.sql.hive/HiveExternalCatalogVersionsSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/) keeps failing. First it couldn't download Spark v.2.2.0 (indeed, it wasn't available at the mirror it selected), now it's failing with this exception below. Can someone explain these errors for me? Is anybody else experiencing similar problems? Thanks, Petar Error Message java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.1"): error=2, No such file or directory Stacktrace sbt.ForkMain$ForkError: java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.1"): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.sql.hive.SparkSubmitTestUtils$class.runSparkSubmit(SparkSubmitTestUtils.scala:73) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.runSparkSubmit(HiveExternalCatalogVersionsSuite.scala:43) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:176) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:161) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.beforeAll(HiveExternalCatalogVersionsSuite.scala:161) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: sbt.ForkMain$ForkError: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:248) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 17 more
Re: Sort-merge join improvement
Hi, we went through a round of reviews on this PR. Performance improvements can be substantial and there are unit and performance tests included. One remark was that the amount of changed code is large but I don't see how to reduce it and still keep the performance improvements. Besides, all the new code is well contained in separate classes (unless it was necessary to change existing ones). So I believe this is ready to be merged. Can some of the committers please take another look at this and accept the PR? Thank you, Petar Zecevic Le 5/15/2018 à 10:55 AM, Petar Zecevic a écrit : Based on some reviews I put additional effort into fixing the case when wholestage codegen is turned off. Sort-merge join with additional range conditions is now 10x faster (can be more or less, depending on exact use-case) in both cases - with wholestage turned off or on - compared to non-optimized SMJ. Merging this would help us tremendously and I believe this can be useful in other applications, too. Can you please review (https://github.com/apache/spark/pull/21109) and merge the patch? Thank you, Petar Zecevic Le 4/23/2018 à 6:28 PM, Petar Zecevic a écrit : Hi, the PR tests completed successfully (https://github.com/apache/spark/pull/21109). Can you please review the patch and merge it upstream if you think it's OK? Thanks, Petar Le 4/18/2018 à 4:52 PM, Petar Zecevic a écrit : As instructed offline, I opened a JIRA for this: https://issues.apache.org/jira/browse/SPARK-24020 I will create a pull request soon. Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit : Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of Spark's sort-merge join (SMJ) which has improved performance of our jobs considerably and we would like to know if Spark community thinks it would be useful to include this in the main distribution. The problem we are solving is the case where you have two big tables partitioned by X column, but also sorted by Y column (within partitions) and you need to calculate an expensive function on the joined rows. During a sort-merge join, Spark will do cross-joins of all rows that have the same X values and calculate the function's value on all of them. If the two tables have a large number of rows per X, this can result in a huge number of calculations. Our optimization allows you to reduce the number of matching rows per X using a range condition on Y columns of the two tables. Something like: ... WHERE t1.X = t2.X AND t1.Y BETWEEN t2.Y - d AND t2.Y + d The way SMJ is currently implemented, these extra conditions have no influence on the number of rows (per X) being checked because these extra conditions are put in the same block with the function being calculated. Our optimization changes the sort-merge join so that, when these extra conditions are specified, a queue is used instead of the ExternalAppendOnlyUnsafeRowArray class. This queue is then used as a moving window across the values from the right relation as the left row changes. You could call this a combination of an equi-join and a theta join (we call it "sort-merge inner range join"). Potential use-cases for this are joins based on spatial or temporal distance calculations. The optimization is triggered automatically when an equi-join expression is present AND lower and upper range conditions on a secondary column are specified. If the tables aren't sorted by both columns, appropriate sorts will be added. We have several questions: 1. Do you see any other way to optimize queries like these (eliminate unnecessary calculations) without changing the sort-merge join algorithm? 2. We believe there is a more general pattern here and that this could help in other similar situations where secondary sorting is available. Would you agree? 3. Would you like us to open a JIRA ticket and create a pull request? Thanks, Petar Zecevic - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Sort-merge join improvement
Based on some reviews I put additional effort into fixing the case when wholestage codegen is turned off. Sort-merge join with additional range conditions is now 10x faster (can be more or less, depending on exact use-case) in both cases - with wholestage turned off or on - compared to non-optimized SMJ. Merging this would help us tremendously and I believe this can be useful in other applications, too. Can you please review (https://github.com/apache/spark/pull/21109) and merge the patch? Thank you, Petar Zecevic Le 4/23/2018 à 6:28 PM, Petar Zecevic a écrit : Hi, the PR tests completed successfully (https://github.com/apache/spark/pull/21109). Can you please review the patch and merge it upstream if you think it's OK? Thanks, Petar Le 4/18/2018 à 4:52 PM, Petar Zecevic a écrit : As instructed offline, I opened a JIRA for this: https://issues.apache.org/jira/browse/SPARK-24020 I will create a pull request soon. Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit : Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of Spark's sort-merge join (SMJ) which has improved performance of our jobs considerably and we would like to know if Spark community thinks it would be useful to include this in the main distribution. The problem we are solving is the case where you have two big tables partitioned by X column, but also sorted by Y column (within partitions) and you need to calculate an expensive function on the joined rows. During a sort-merge join, Spark will do cross-joins of all rows that have the same X values and calculate the function's value on all of them. If the two tables have a large number of rows per X, this can result in a huge number of calculations. Our optimization allows you to reduce the number of matching rows per X using a range condition on Y columns of the two tables. Something like: ... WHERE t1.X = t2.X AND t1.Y BETWEEN t2.Y - d AND t2.Y + d The way SMJ is currently implemented, these extra conditions have no influence on the number of rows (per X) being checked because these extra conditions are put in the same block with the function being calculated. Our optimization changes the sort-merge join so that, when these extra conditions are specified, a queue is used instead of the ExternalAppendOnlyUnsafeRowArray class. This queue is then used as a moving window across the values from the right relation as the left row changes. You could call this a combination of an equi-join and a theta join (we call it "sort-merge inner range join"). Potential use-cases for this are joins based on spatial or temporal distance calculations. The optimization is triggered automatically when an equi-join expression is present AND lower and upper range conditions on a secondary column are specified. If the tables aren't sorted by both columns, appropriate sorts will be added. We have several questions: 1. Do you see any other way to optimize queries like these (eliminate unnecessary calculations) without changing the sort-merge join algorithm? 2. We believe there is a more general pattern here and that this could help in other similar situations where secondary sorting is available. Would you agree? 3. Would you like us to open a JIRA ticket and create a pull request? Thanks, Petar Zecevic - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Sort-merge join improvement
Hi, the PR tests completed successfully (https://github.com/apache/spark/pull/21109). Can you please review the patch and merge it upstream if you think it's OK? Thanks, Petar Le 4/18/2018 à 4:52 PM, Petar Zecevic a écrit : As instructed offline, I opened a JIRA for this: https://issues.apache.org/jira/browse/SPARK-24020 I will create a pull request soon. Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit : Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of Spark's sort-merge join (SMJ) which has improved performance of our jobs considerably and we would like to know if Spark community thinks it would be useful to include this in the main distribution. The problem we are solving is the case where you have two big tables partitioned by X column, but also sorted by Y column (within partitions) and you need to calculate an expensive function on the joined rows. During a sort-merge join, Spark will do cross-joins of all rows that have the same X values and calculate the function's value on all of them. If the two tables have a large number of rows per X, this can result in a huge number of calculations. Our optimization allows you to reduce the number of matching rows per X using a range condition on Y columns of the two tables. Something like: ... WHERE t1.X = t2.X AND t1.Y BETWEEN t2.Y - d AND t2.Y + d The way SMJ is currently implemented, these extra conditions have no influence on the number of rows (per X) being checked because these extra conditions are put in the same block with the function being calculated. Our optimization changes the sort-merge join so that, when these extra conditions are specified, a queue is used instead of the ExternalAppendOnlyUnsafeRowArray class. This queue is then used as a moving window across the values from the right relation as the left row changes. You could call this a combination of an equi-join and a theta join (we call it "sort-merge inner range join"). Potential use-cases for this are joins based on spatial or temporal distance calculations. The optimization is triggered automatically when an equi-join expression is present AND lower and upper range conditions on a secondary column are specified. If the tables aren't sorted by both columns, appropriate sorts will be added. We have several questions: 1. Do you see any other way to optimize queries like these (eliminate unnecessary calculations) without changing the sort-merge join algorithm? 2. We believe there is a more general pattern here and that this could help in other similar situations where secondary sorting is available. Would you agree? 3. Would you like us to open a JIRA ticket and create a pull request? Thanks, Petar Zecevic - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Sort-merge join improvement
As instructed offline, I opened a JIRA for this: https://issues.apache.org/jira/browse/SPARK-24020 I will create a pull request soon. Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit : Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of Spark's sort-merge join (SMJ) which has improved performance of our jobs considerably and we would like to know if Spark community thinks it would be useful to include this in the main distribution. The problem we are solving is the case where you have two big tables partitioned by X column, but also sorted by Y column (within partitions) and you need to calculate an expensive function on the joined rows. During a sort-merge join, Spark will do cross-joins of all rows that have the same X values and calculate the function's value on all of them. If the two tables have a large number of rows per X, this can result in a huge number of calculations. Our optimization allows you to reduce the number of matching rows per X using a range condition on Y columns of the two tables. Something like: ... WHERE t1.X = t2.X AND t1.Y BETWEEN t2.Y - d AND t2.Y + d The way SMJ is currently implemented, these extra conditions have no influence on the number of rows (per X) being checked because these extra conditions are put in the same block with the function being calculated. Our optimization changes the sort-merge join so that, when these extra conditions are specified, a queue is used instead of the ExternalAppendOnlyUnsafeRowArray class. This queue is then used as a moving window across the values from the right relation as the left row changes. You could call this a combination of an equi-join and a theta join (we call it "sort-merge inner range join"). Potential use-cases for this are joins based on spatial or temporal distance calculations. The optimization is triggered automatically when an equi-join expression is present AND lower and upper range conditions on a secondary column are specified. If the tables aren't sorted by both columns, appropriate sorts will be added. We have several questions: 1. Do you see any other way to optimize queries like these (eliminate unnecessary calculations) without changing the sort-merge join algorithm? 2. We believe there is a more general pattern here and that this could help in other similar situations where secondary sorting is available. Would you agree? 3. Would you like us to open a JIRA ticket and create a pull request? Thanks, Petar Zecevic - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Sort-merge join improvement
Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization of Spark's sort-merge join (SMJ) which has improved performance of our jobs considerably and we would like to know if Spark community thinks it would be useful to include this in the main distribution. The problem we are solving is the case where you have two big tables partitioned by X column, but also sorted by Y column (within partitions) and you need to calculate an expensive function on the joined rows. During a sort-merge join, Spark will do cross-joins of all rows that have the same X values and calculate the function's value on all of them. If the two tables have a large number of rows per X, this can result in a huge number of calculations. Our optimization allows you to reduce the number of matching rows per X using a range condition on Y columns of the two tables. Something like: ... WHERE t1.X = t2.X AND t1.Y BETWEEN t2.Y - d AND t2.Y + d The way SMJ is currently implemented, these extra conditions have no influence on the number of rows (per X) being checked because these extra conditions are put in the same block with the function being calculated. Our optimization changes the sort-merge join so that, when these extra conditions are specified, a queue is used instead of the ExternalAppendOnlyUnsafeRowArray class. This queue is then used as a moving window across the values from the right relation as the left row changes. You could call this a combination of an equi-join and a theta join (we call it "sort-merge inner range join"). Potential use-cases for this are joins based on spatial or temporal distance calculations. The optimization is triggered automatically when an equi-join expression is present AND lower and upper range conditions on a secondary column are specified. If the tables aren't sorted by both columns, appropriate sorts will be added. We have several questions: 1. Do you see any other way to optimize queries like these (eliminate unnecessary calculations) without changing the sort-merge join algorithm? 2. We believe there is a more general pattern here and that this could help in other similar situations where secondary sorting is available. Would you agree? 3. Would you like us to open a JIRA ticket and create a pull request? Thanks, Petar Zecevic - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Jar for Spark developement
You can check out the Spark in Action book. In my (not so humble) opinion, it's very good for beginners. Petar (author) On 21.6.2016. 18:01, tesm...@gmail.com wrote: Hi, Beginner in Spark development. Took time to configure Eclipse + Scala. Is there any tutorial that can help beginners. Still struggling to find Spark JAR files for development. There is no lib folder in my Spark distribution (neither in pre-built nor in custom built..) Regards,
Re: Spark development with IntelliJ
This helped me: http://stackoverflow.com/questions/26995023/errorscalac-bad-option-p-intellij-idea On 8.1.2015. 11:00, Jakub Dubovsky wrote: Hi devs, I'd like to ask if anybody has experience with using intellij 14 to step into spark code. Whatever I try I get compilation error: Error:scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/ paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar Project is set up by Patrick's instruction [1] and packaged by mvn - DskipTests clean install. Compilation works fine. Then I just created breakpoint in test code and run debug with the error. Thanks for any hints Jakub [1] https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+ Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org