posted a JIRA https://issues.apache.org/jira/browse/SPARK-1952
On Wed, May 28, 2014 at 1:14 PM, Ryan Compton <compton.r...@gmail.com> wrote: > Remark, just including the jar built by sbt will produce the same > error. i,.e this pig script will fail: > > REGISTER > /usr/share/osi1/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop0.20.2-cdh3u4.jar; > > edgeList0 = LOAD > '/user/rfcompton/twitter-mention-networks/bidirectional-network-current/part-r-00001' > USING PigStorage() AS (id1:long, id2:long, weight:int); > ttt = LIMIT edgeList0 10; > DUMP ttt; > > On Wed, May 28, 2014 at 12:55 PM, Ryan Compton <compton.r...@gmail.com> wrote: >> It appears to be Spark 1.0 related. I made a pom.xml with a single >> dependency on Spark, registering the resulting jar created the error. >> >> Spark 1.0 was compiled via $ SPARK_HADOOP_VERSION=0.20.2-cdh3u4 sbt/sbt >> assembly >> >> The pom.xml, as well as some other information, is below. The only >> thing that should not be standard is the inclusion of my in-house >> repository (it's where I host the spark jar I compiled above). >> >> <project xmlns="http://maven.apache.org/POM/4.0.0" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 >> http://maven.apache.org/xsd/maven-4.0.0.xsd"> >> <modelVersion>4.0.0</modelVersion> >> >> <groupId>com.mycompany.app</groupId> >> <artifactId>my-app</artifactId> >> <version>1.0-SNAPSHOT</version> >> <packaging>jar</packaging> >> >> <name>my-app</name> >> <url>http://maven.apache.org</url> >> >> <properties> >> <maven.compiler.source>1.6</maven.compiler.source> >> <maven.compiler.target>1.6</maven.compiler.target> >> <encoding>UTF-8</encoding> >> <scala.version>2.10.4</scala.version> >> </properties> >> >> <build> >> <pluginManagement> >> <plugins> >> <plugin> >> <groupId>net.alchim31.maven</groupId> >> <artifactId>scala-maven-plugin</artifactId> >> <version>3.1.5</version> >> </plugin> >> <plugin> >> <groupId>org.apache.maven.plugins</groupId> >> <artifactId>maven-compiler-plugin</artifactId> >> <version>2.0.2</version> >> </plugin> >> </plugins> >> </pluginManagement> >> >> <plugins> >> >> <plugin> >> <groupId>net.alchim31.maven</groupId> >> <artifactId>scala-maven-plugin</artifactId> >> <executions> >> <execution> >> <id>scala-compile-first</id> >> <phase>process-resources</phase> >> <goals> >> <goal>add-source</goal> >> <goal>compile</goal> >> </goals> >> </execution> >> <execution> >> <id>scala-test-compile</id> >> <phase>process-test-resources</phase> >> <goals> >> <goal>testCompile</goal> >> </goals> >> </execution> >> </executions> >> </plugin> >> >> <!-- Plugin to create a single jar that includes all >> dependencies --> >> <plugin> >> <artifactId>maven-assembly-plugin</artifactId> >> <version>2.4</version> >> <configuration> >> <descriptorRefs> >> <descriptorRef>jar-with-dependencies</descriptorRef> >> </descriptorRefs> >> </configuration> >> <executions> >> <execution> >> <id>make-assembly</id> >> <phase>package</phase> >> <goals> >> <goal>single</goal> >> </goals> >> </execution> >> </executions> >> </plugin> >> >> </plugins> >> </build> >> >> <repositories> >> >> <!-- needed for cdh build of Spark --> >> <repository> >> <id>releases</id> >> <url>10.10.1.29:8081/nexus/content/repositories/releases</url> >> </repository> >> >> <repository> >> <id>cloudera</id> >> >> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> >> </repository> >> >> </repositories> >> >> <dependencies> >> >> <dependency> >> <groupId>org.scala-lang</groupId> >> <artifactId>scala-library</artifactId> >> <version>${scala.version}</version> >> </dependency> >> >> <!--on node29--> >> <dependency> >> <groupId>org.apache.spark</groupId> >> <artifactId>spark-assembly</artifactId> >> <version>1.0.0-cdh3u4</version> >> <classifier>cdh3u4</classifier> >> </dependency> >> >> <!--spark docs says I need hadoop-client, cdh3u3 repo no >> longer exists--> >> <dependency> >> <groupId>org.apache.hadoop</groupId> >> <artifactId>hadoop-client</artifactId> >> <version>0.20.2-cdh3u4</version> >> </dependency> >> >> </dependencies> >> </project> >> >> >> Here's what I get in the dependency tree: >> >> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ my-app --- >> [INFO] com.mycompany.app:my-app:jar:1.0-SNAPSHOT >> [INFO] +- org.scala-lang:scala-library:jar:2.10.4:compile >> [INFO] +- org.apache.spark:spark-assembly:jar:cdh3u4:1.0.0-cdh3u4:compile >> [INFO] \- org.apache.hadoop:hadoop-client:jar:0.20.2-cdh3u4:compile >> [INFO] \- org.apache.hadoop:hadoop-core:jar:0.20.2-cdh3u4:compile >> [INFO] +- com.cloudera.cdh:hadoop-ant:pom:0.20.2-cdh3u4:compile >> [INFO] +- xmlenc:xmlenc:jar:0.52:compile >> [INFO] +- >> org.apache.hadoop.thirdparty.guava:guava:jar:r09-jarjar:compile >> [INFO] +- commons-codec:commons-codec:jar:1.4:compile >> [INFO] +- commons-net:commons-net:jar:1.4.1:compile >> [INFO] | \- (oro:oro:jar:2.0.8:compile - omitted for duplicate) >> [INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.5.2:compile >> [INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.5.2:compile >> [INFO] | \- >> (org.codehaus.jackson:jackson-core-asl:jar:1.5.2:compile - omitted for >> duplicate) >> [INFO] +- commons-el:commons-el:jar:1.0:compile >> [INFO] | \- commons-logging:commons-logging:jar:1.0.3:compile >> [INFO] +- hsqldb:hsqldb:jar:1.8.0.7:compile >> [INFO] \- oro:oro:jar:2.0.8:compile >> >> >> While I don't see slf4j anywhere in there, it does manage to find it's >> way into the jar somehow: >> rfcompton@node19 /u/s/o/n/my-app> find . -name "*.jar" | xargs -tn1 >> jar tvf | grep -i "slf" | grep LocationAware >> jar tvf ./target/my-app-1.0-SNAPSHOT.jar >> jar tvf ./target/my-app-1.0-SNAPSHOT-jar-with-dependencies.jar >> 3259 Mon Mar 25 21:49:34 PDT 2013 >> org/apache/commons/logging/impl/SLF4JLocationAwareLog.class >> 455 Mon Mar 25 21:49:22 PDT 2013 org/slf4j/spi/LocationAwareLogger.class >> 479 Fri Dec 13 16:44:40 PST 2013 >> parquet/org/slf4j/spi/LocationAwareLogger.class >> >> Here's a pig script that will fail with the slf4j error: >> >> REGISTER >> /usr/share/osi1/nonhome/my-app/target/my-app-1.0-SNAPSHOT-jar-with-dependencies.jar; >> >> edgeList0 = LOAD >> '/user/rfcompton/twitter-mention-networks/bidirectional-network-current/part-r-00001' >> USING PigStorage() AS (id1:long, id2:long, weight:int); >> >> ttt = LIMIT edgeList0 10; >> DUMP ttt; >> >> (the error) >> rfcompton@node19 /u/s/o/n/my-app> pig src/main/pig/testSparkJar.pig >> 2014-05-28 12:43:58,076 [main] INFO org.apache.pig.Main - Apache Pig >> version 0.12.1 (r1585011) compiled Apr 05 2014, 01:41:34 >> 2014-05-28 12:43:58,078 [main] INFO org.apache.pig.Main - Logging >> error messages to: >> /usr/share/osi1/nonhome/my-app/pig_1401306238074.log >> 2014-05-28 12:43:58,722 [main] INFO org.apache.pig.impl.util.Utils - >> Default bootup file /home/isl/rfcompton/.pigbootup not found >> 2014-05-28 12:43:59,195 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> Connecting to hadoop file system at: hdfs://master:8020/ >> 2014-05-28 12:43:59,811 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> Connecting to map-reduce job tracker at: node4:8021 >> 2014-05-28 12:44:00,987 [main] ERROR org.apache.pig.tools.grunt.Grunt >> - ERROR 2998: Unhandled internal error. >> org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V >> Details at logfile: /usr/share/osi1/nonhome/my-app/pig_1401306238074.log >> >> >> To confirm this is 1.0 related, I modified the pom.xml build with >> 0.9.1 and saw no problems from pig. Looking into the 0.9.1 jar >> revealed less dependence on slf4j (i.e >> "parquet/org/slf4j/spi/LocationAwareLogger.class" appeared in >> spark1.0). >> >> (after recompiling for 0.9.1) >> rfcompton@node19 /u/s/o/n/my-app> find . -name "*.jar" | xargs -tn1 >> jar tvf | grep -i "slf" | grep LocationAware >> jar tvf ./target/my-app-1.0-SNAPSHOT.jar >> jar tvf ./target/my-app-1.0-SNAPSHOT-jar-with-dependencies.jar >> 455 Mon Mar 25 21:49:22 PDT 2013 org/slf4j/spi/LocationAwareLogger.class >> >> >> On Tue, May 27, 2014 at 2:53 PM, Sean Owen <so...@cloudera.com> wrote: >>> Spark uses 1.7.5, and you should probably see 1.7.{4,5} in use through >>> Hadoop. But those are compatible. >>> >>> That method appears to have been around since 1.3. What version does Pig >>> want? >>> >>> I usually do "mvn -Dverbose dependency:tree" to see both what the >>> final dependencies are, and what got overwritten, to diagnose things >>> like this. >>> >>> My hunch is that something is depending on an old slf4j in your build >>> and it's overwriting Spark et al. >>> >>> On Tue, May 27, 2014 at 10:45 PM, Ryan Compton <compton.r...@gmail.com> >>> wrote: >>>> I use both Pig and Spark. All my code is built with Maven into a giant >>>> *-jar-with-dependencies.jar. I recently upgraded to Spark 1.0 and now >>>> all my pig scripts fail with: >>>> >>>> Caused by: java.lang.RuntimeException: Could not resolve error that >>>> occured when launching map reduce job: java.lang.NoSuchMethodError: >>>> org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V >>>> at >>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:598) >>>> at java.lang.Thread.dispatchUncaughtException(Thread.java:1874) >>>> >>>> >>>> Did Spark 1.0 change the version of slf4j? I can't seem to find it via >>>> mvn dependency:tree