Re: 1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-14 Thread Egor Pahomov
My fault, I should have read documentation more accurate -
http://spark.apache.org/docs/latest/sql-programming-guide.html precisely
says, that I need to add these 3 jars to class path in case I need them. We
can not include them in fat jar, because they OSGI and require to have
plugin.xml and META_INF/MANIFEST.MF in root of jar. The problem is you have
3 of them and every one has it's own plugin.xml. You can include all this
in fat jar if you would be able to merge plugin.xml, but currently there is
no tool to do so. maven assembly plugin just has no such merger, maven
shaded plugin has XmlAppenderTransformer, but for some reason it doesn't
work. And that is it - you just have to live with the fact, that you have
fat jar with all dep, except these 3. Good news is if you are in
yarn-client mode you only need to add them to classpath of your driver, you
do not have to do addJar(). It's really good news, since it's hard to do
addJar() properly in Oozie job.

2016-01-12 17:01 GMT-08:00 Egor Pahomov :

> Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
> serious issue. I successfully updated spark thrift server from 1.5.2 to
> 1.6.0. But I have standalone application, which worked fine with 1.5.2 but
> failing on 1.6.0 with:
>
> *NestedThrowables:*
> *java.lang.ClassNotFoundException:
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory*
> * at
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)*
> * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)*
> * at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)*
>
> Inside this application I work with hive table, which have data in json
> format.
>
> When I add
>
> 
> org.datanucleus
> datanucleus-core
> 4.0.0-release
> 
>
> 
> org.datanucleus
> datanucleus-api-jdo
> 4.0.0-release
> 
>
> 
> org.datanucleus
> datanucleus-rdbms
> 3.2.9
> 
>
> I'm getting:
>
> *Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence
> process has been specified to use a ClassLoaderResolver of name
> "datanucleus" yet this has not been found by the DataNucleus plugin
> mechanism. Please check your CLASSPATH and plugin specification.*
> * at
> org.datanucleus.AbstractNucleusContext.(AbstractNucleusContext.java:102)*
> * at
> org.datanucleus.PersistenceNucleusContextImpl.(PersistenceNucleusContextImpl.java:162)*
>
> I have CDH 5.5. I build spark with
>
> *./make-distribution.sh -Pyarn -Phadoop-2.6
> -Dhadoop.version=2.6.0-cdh5.5.0 -Phive -DskipTests*
>
> Than I publish fat jar locally:
>
> *mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
> -Dfile=./spark-assembly.jar -DgroupId=org.spark-project
> -DartifactId=my-spark-assembly -Dversion=1.6.0-SNAPSHOT -Dpackaging=jar*
>
> Than I include dependency on this fat jar:
>
> 
> org.spark-project
> my-spark-assembly
> 1.6.0-SNAPSHOT
> 
>
> Than I build my application with assembly plugin:
>
> 
> org.apache.maven.plugins
> maven-shade-plugin
> 
> 
> 
> *:*
> 
> 
> 
> 
> *:*
> 
> META-INF/*.SF
> META-INF/*.DSA
> META-INF/*.RSA
> 
> 
> 
> 
> 
> 
> package
> 
> shade
> 
> 
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
>  
> implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
> 
> META-INF/services/org.apache.hadoop.fs.FileSystem
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
> reference.conf
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
> log4j.properties
> 
>  
> implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"/>
>  
> implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer"/>
> 
> 
> 
> 
> 
>
> Configuration of assembly plugin is copy-past from spark assembly pom.
>
> This workflow worked for 1.5.2 and broke for 1.6.0. If I have not good 
> approach of creating this standalone application, please recommend other 
> approach, but spark-submit does not work for me - it hard for me to connect 
> it to Oozie.
>
> Any suggestion would be appreciated - I'm stuck.
>
> My s

1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-12 Thread Egor Pahomov
Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
serious issue. I successfully updated spark thrift server from 1.5.2 to
1.6.0. But I have standalone application, which worked fine with 1.5.2 but
failing on 1.6.0 with:

*NestedThrowables:*
*java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory*
* at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)*
* at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)*
* at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)*

Inside this application I work with hive table, which have data in json
format.

When I add


org.datanucleus
datanucleus-core
4.0.0-release



org.datanucleus
datanucleus-api-jdo
4.0.0-release



org.datanucleus
datanucleus-rdbms
3.2.9


I'm getting:

*Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence
process has been specified to use a ClassLoaderResolver of name
"datanucleus" yet this has not been found by the DataNucleus plugin
mechanism. Please check your CLASSPATH and plugin specification.*
* at
org.datanucleus.AbstractNucleusContext.(AbstractNucleusContext.java:102)*
* at
org.datanucleus.PersistenceNucleusContextImpl.(PersistenceNucleusContextImpl.java:162)*

I have CDH 5.5. I build spark with

*./make-distribution.sh -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.5.0
-Phive -DskipTests*

Than I publish fat jar locally:

*mvn org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
-Dfile=./spark-assembly.jar -DgroupId=org.spark-project
-DartifactId=my-spark-assembly -Dversion=1.6.0-SNAPSHOT -Dpackaging=jar*

Than I include dependency on this fat jar:


org.spark-project
my-spark-assembly
1.6.0-SNAPSHOT


Than I build my application with assembly plugin:


org.apache.maven.plugins
maven-shade-plugin



*:*




*:*

META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA






package

shade






META-INF/services/org.apache.hadoop.fs.FileSystem


reference.conf


log4j.properties









Configuration of assembly plugin is copy-past from spark assembly pom.

This workflow worked for 1.5.2 and broke for 1.6.0. If I have not good
approach of creating this standalone application, please recommend
other approach, but spark-submit does not work for me - it hard for me
to connect it to Oozie.

Any suggestion would be appreciated - I'm stuck.

My spark config:

lazy val sparkConf = new SparkConf()
  .setMaster("yarn-client")
  .setAppName(appName)
  .set("spark.yarn.queue", "jenkins")
  .set("spark.executor.memory", "10g")
  .set("spark.yarn.executor.memoryOverhead", "2000")
  .set("spark.executor.cores", "3")
  .set("spark.driver.memory", "4g")
  .set("spark.shuffle.io.numConnectionsPerPeer", "5")
  .set("spark.sql.autoBroadcastJoinThreshold", "200483647")
  .set("spark.network.timeout", "1000s")
  .set("spark.executor.extraJavaOptions", "-XX:MaxPermSize=2g")
  .set("spark.driver.maxResultSize", "2g")
  .set("spark.rpc.lookupTimeout", "1000s")
  .set("spark.sql.hive.convertMetastoreParquet", "false")
  .set("spark.kryoserializer.buffer.max", "200m")
  .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .set("spark.yarn.driver.memoryOverhead", "1000")
  .set("spark.dynamicAllocation.enabled", "true")
  .set("spark.shuffle.service.enabled", "true")
  .set("spark.dynamicAllocation.minExecutors", "1")
  .set("spark.dynamicAllocation.maxExecutors", "20")
  .set("spark.dynamicAllocation.executorIdleTimeout", "60s")
  .set("spark.sql.tungsten.enabled", "false")
  .set("spark.dynamicAllocation.cachedExecutorIdleTimeout", "100s")
.setJars(List(this.getClass.getProtectionDomain().getCodeSource().getLocation().toURI().getPath()))

-- 



*Sincerely yoursEgor Pakhomov*