Re: 1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
My fault, I should have read documentation more accurate - http://spark.apache.org/docs/latest/sql-programming-guide.html precisely says, that I need to add these 3 jars to class path in case I need them. We can not include them in fat jar, because they OSGI and require to have plugin.xml and META_INF/MANIFEST.MF in root of jar. The problem is you have 3 of them and every one has it's own plugin.xml. You can include all this in fat jar if you would be able to merge plugin.xml, but currently there is no tool to do so. maven assembly plugin just has no such merger, maven shaded plugin has XmlAppenderTransformer, but for some reason it doesn't work. And that is it - you just have to live with the fact, that you have fat jar with all dep, except these 3. Good news is if you are in yarn-client mode you only need to add them to classpath of your driver, you do not have to do addJar(). It's really good news, since it's hard to do addJar() properly in Oozie job. 2016-01-12 17:01 GMT-08:00 Egor Pahomov : > Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing > serious issue. I successfully updated spark thrift server from 1.5.2 to > 1.6.0. But I have standalone application, which worked fine with 1.5.2 but > failing on 1.6.0 with: > > *NestedThrowables:* > *java.lang.ClassNotFoundException: > org.datanucleus.api.jdo.JDOPersistenceManagerFactory* > * at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)* > * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)* > * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)* > > Inside this application I work with hive table, which have data in json > format. > > When I add > > > org.datanucleus > datanucleus-core > 4.0.0-release > > > > org.datanucleus > datanucleus-api-jdo > 4.0.0-release > > > > org.datanucleus > datanucleus-rdbms > 3.2.9 > > > I'm getting: > > *Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence > process has been specified to use a ClassLoaderResolver of name > "datanucleus" yet this has not been found by the DataNucleus plugin > mechanism. Please check your CLASSPATH and plugin specification.* > * at > org.datanucleus.AbstractNucleusContext.(AbstractNucleusContext.java:102)* > * at > org.datanucleus.PersistenceNucleusContextImpl.(PersistenceNucleusContextImpl.java:162)* > > I have CDH 5.5. I build spark with > > *./make-distribution.sh -Pyarn -Phadoop-2.6 > -Dhadoop.version=2.6.0-cdh5.5.0 -Phive -DskipTests* > > Than I publish fat jar locally: > > *mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file > -Dfile=./spark-assembly.jar -DgroupId=org.spark-project > -DartifactId=my-spark-assembly -Dversion=1.6.0-SNAPSHOT -Dpackaging=jar* > > Than I include dependency on this fat jar: > > > org.spark-project > my-spark-assembly > 1.6.0-SNAPSHOT > > > Than I build my application with assembly plugin: > > > org.apache.maven.plugins > maven-shade-plugin > > > > *:* > > > > > *:* > > META-INF/*.SF > META-INF/*.DSA > META-INF/*.RSA > > > > > > > package > > shade > > > > > implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> > > implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> > > META-INF/services/org.apache.hadoop.fs.FileSystem > > > implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> > reference.conf > > > implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer"> > log4j.properties > > > implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"/> > > implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer"/> > > > > > > > Configuration of assembly plugin is copy-past from spark assembly pom. > > This workflow worked for 1.5.2 and broke for 1.6.0. If I have not good > approach of creating this standalone application, please recommend other > approach, but spark-submit does not work for me - it hard for me to connect > it to Oozie. > > Any suggestion would be appreciated - I'm stuck. > > My s
1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing serious issue. I successfully updated spark thrift server from 1.5.2 to 1.6.0. But I have standalone application, which worked fine with 1.5.2 but failing on 1.6.0 with: *NestedThrowables:* *java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory* * at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)* * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)* * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)* Inside this application I work with hive table, which have data in json format. When I add org.datanucleus datanucleus-core 4.0.0-release org.datanucleus datanucleus-api-jdo 4.0.0-release org.datanucleus datanucleus-rdbms 3.2.9 I'm getting: *Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.* * at org.datanucleus.AbstractNucleusContext.(AbstractNucleusContext.java:102)* * at org.datanucleus.PersistenceNucleusContextImpl.(PersistenceNucleusContextImpl.java:162)* I have CDH 5.5. I build spark with *./make-distribution.sh -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.5.0 -Phive -DskipTests* Than I publish fat jar locally: *mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file -Dfile=./spark-assembly.jar -DgroupId=org.spark-project -DartifactId=my-spark-assembly -Dversion=1.6.0-SNAPSHOT -Dpackaging=jar* Than I include dependency on this fat jar: org.spark-project my-spark-assembly 1.6.0-SNAPSHOT Than I build my application with assembly plugin: org.apache.maven.plugins maven-shade-plugin *:* *:* META-INF/*.SF META-INF/*.DSA META-INF/*.RSA package shade META-INF/services/org.apache.hadoop.fs.FileSystem reference.conf log4j.properties Configuration of assembly plugin is copy-past from spark assembly pom. This workflow worked for 1.5.2 and broke for 1.6.0. If I have not good approach of creating this standalone application, please recommend other approach, but spark-submit does not work for me - it hard for me to connect it to Oozie. Any suggestion would be appreciated - I'm stuck. My spark config: lazy val sparkConf = new SparkConf() .setMaster("yarn-client") .setAppName(appName) .set("spark.yarn.queue", "jenkins") .set("spark.executor.memory", "10g") .set("spark.yarn.executor.memoryOverhead", "2000") .set("spark.executor.cores", "3") .set("spark.driver.memory", "4g") .set("spark.shuffle.io.numConnectionsPerPeer", "5") .set("spark.sql.autoBroadcastJoinThreshold", "200483647") .set("spark.network.timeout", "1000s") .set("spark.executor.extraJavaOptions", "-XX:MaxPermSize=2g") .set("spark.driver.maxResultSize", "2g") .set("spark.rpc.lookupTimeout", "1000s") .set("spark.sql.hive.convertMetastoreParquet", "false") .set("spark.kryoserializer.buffer.max", "200m") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.yarn.driver.memoryOverhead", "1000") .set("spark.dynamicAllocation.enabled", "true") .set("spark.shuffle.service.enabled", "true") .set("spark.dynamicAllocation.minExecutors", "1") .set("spark.dynamicAllocation.maxExecutors", "20") .set("spark.dynamicAllocation.executorIdleTimeout", "60s") .set("spark.sql.tungsten.enabled", "false") .set("spark.dynamicAllocation.cachedExecutorIdleTimeout", "100s") .setJars(List(this.getClass.getProtectionDomain().getCodeSource().getLocation().toURI().getPath())) -- *Sincerely yoursEgor Pakhomov*