So to answer my own question. It is a bug and there is unmerged PR for that already.
https://issues.apache.org/jira/browse/SPARK-2624 https://github.com/apache/spark/pull/3238 Jakub ---------- Původní zpráva ---------- Od: spark.dubovsky.ja...@seznam.cz Komu: spark.dubovsky.ja...@seznam.cz Datum: 12. 12. 2014 15:26:35 Předmět: Re: Including data nucleus tools " Hi, I had time to try it again. I submited my app by the same command with these additional options: --jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-core-3.2.10.jar, lib/datanucleus-rdbms-3.2.9.jar Now an app successfully creates hive context. So my question remains: Is "classpath entries" from sparkUI the same classpath as mentioned in submit script message? "Spark assembly has been built with Hive, including Datanucleus jars on classpath" If so then why the script fails to really include datanucleus jars on classpath? I found no bug about this on jira. Or is there a way how particular yarn/os settings on our cluster overrides this? Thanks in advance Jakub ---------- Původní zpráva ---------- Od: spark.dubovsky.ja...@seznam.cz Komu: Michael Armbrust <mich...@databricks.com> Datum: 7. 12. 2014 3:02:33 Předmět: Re: Including data nucleus tools " Next try. I copied whole dist directory created by make-distribution script to cluster not just assembly jar. Then I used ./bin/spark-submit --num-executors 200 --master yarn-cluster --class org. apache.spark.mllib.CreateGuidDomainDictionary ../spark/root-0.1.jar ${args} ...to run app again. Startup scripts printed this message: "Spark assembly has been built with Hive, including Datanucleus jars on classpath" ...so I thought I am finally there. But job started and failed on the same ClassNotFound exception as before. Is "classpath" from script message just classpath of driver? Or is it the same classpath which is affected by --jars option? I was trying to find out from scripts but I was not able to find where --jars option is processed. thanks ---------- Původní zpráva ---------- Od: Michael Armbrust <mich...@databricks.com> Komu: spark.dubovsky.ja...@seznam.cz Datum: 6. 12. 2014 20:39:13 Předmět: Re: Including data nucleus tools " On Sat, Dec 6, 2014 at 5:53 AM, <spark.dubovsky.ja...@seznam.cz (mailto:/skin/default/img/empty.gif)> wrote:" Bonus question: Should the class org.datanucleus.api.jdo. JDOPersistenceManagerFactory be part of assembly? Because it is not in jar now. " No these jars cannot be put into the assembly because they have extra metadata files that live in the same location (so if you put them all in an assembly they overrwrite each other). This metadata is used in discovery. Instead they must be manually put on the classpath in their original form (usually using --jars). " " "