Somehow worked by placing all the jars(except guava) in hive lib after "--jars". Had initially tried to place the jars under another temporary folder and pointing the executor and driver "extraClassPath" to that director, but didnt work.
On Mon, Oct 27, 2014 at 2:21 PM, Nitin kak <nitinkak...@gmail.com> wrote: > I am now on CDH 5.2 which has the Hive module packaged in it. > > On Mon, Oct 27, 2014 at 2:17 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Which version of CDH are you using? I believe that hive is not correctly >> packaged in 5.1, but should work in 5.2. Another option that people use is >> to deploy the plain Apache version of Spark on CDH Yarn. >> >> On Mon, Oct 27, 2014 at 11:10 AM, Nitin kak <nitinkak...@gmail.com> >> wrote: >> >>> Yes, I added all the Hive jars present in Cloudera distribution of >>> Hadoop. I added them because I was getting ClassNotFoundException for many >>> required classes(one example stack trace below). So, someone on the >>> community suggested to include the hive jars: >>> >>> *Exception in thread "main" java.lang.NoClassDefFoundError: >>> org/apache/hadoop/hive/conf/HiveConf* >>> * at >>> org.apache.spark.sql.hive.api.java.JavaHiveContext.<init>(JavaHiveContext.scala:30)* >>> * at HiveContextExample.main(HiveContextExample.java:57)* >>> * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606)* >>> * at >>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)* >>> * at >>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)* >>> * at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)* >>> *Caused by: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.conf.HiveConf* >>> >>> On Mon, Oct 27, 2014 at 1:57 PM, Michael Armbrust < >>> mich...@databricks.com> wrote: >>> >>>> No such method error almost always means you are mixing different >>>> versions of the same library on the classpath. In this case it looks like >>>> you have more than one version of guava. Have you added anything to the >>>> classpath? >>>> >>>> On Mon, Oct 27, 2014 at 8:36 AM, nitinkak001 <nitinkak...@gmail.com> >>>> wrote: >>>> >>>>> I am working on running the following hive query from spark. >>>>> >>>>> /"SELECT * FROM spark_poc.<table_name> DISTRIBUTE BY GEO_REGION, >>>>> GEO_COUNTRY >>>>> SORT BY IP_ADDRESS, COOKIE_ID"/ >>>>> >>>>> Ran into /java.lang.NoSuchMethodError: >>>>> >>>>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;/ >>>>> (complete stack trace at the bottom). Found a few mentions of this >>>>> issue in >>>>> the user list. It seems(from the below thread link) that there is a >>>>> Guava >>>>> version incompatibility between Spark 1.1.0 and Hive which is probably >>>>> fixed >>>>> in 1.2.0. >>>>> >>>>> / >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-td10110.html#a12671/ >>>>> >>>>> *So, wanted to confirm, is Spark SQL 1.1.0 incompatible with Hive or is >>>>> there a workaround to this?* >>>>> >>>>> >>>>> >>>>> /Exception in thread "Driver" >>>>> java.lang.reflect.InvocationTargetException >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>> at >>>>> >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>> at >>>>> >>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) >>>>> Caused by: java.lang.NoSuchMethodError: >>>>> >>>>> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; >>>>> at >>>>> org.apache.spark.util.collection.OpenHashSet.org >>>>> $apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) >>>>> at >>>>> >>>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) >>>>> at >>>>> >>>>> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) >>>>> at >>>>> >>>>> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) >>>>> at >>>>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) >>>>> at >>>>> >>>>> org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) >>>>> at >>>>> >>>>> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) >>>>> at >>>>> >>>>> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) >>>>> at >>>>> org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) >>>>> at >>>>> >>>>> org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) >>>>> at >>>>> >>>>> org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) >>>>> at >>>>> >>>>> org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) >>>>> at >>>>> >>>>> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:236) >>>>> at >>>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:126) >>>>> at >>>>> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:104) >>>>> at >>>>> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:750) >>>>> at >>>>> >>>>> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:601) >>>>> at >>>>> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:872) >>>>> at >>>>> >>>>> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79) >>>>> at >>>>> >>>>> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68) >>>>> at >>>>> >>>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) >>>>> at >>>>> >>>>> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) >>>>> at >>>>> >>>>> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) >>>>> at >>>>> org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) >>>>> at >>>>> >>>>> org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:68) >>>>> at >>>>> >>>>> org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:68) >>>>> at >>>>> >>>>> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) >>>>> at >>>>> >>>>> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188) >>>>> at >>>>> >>>>> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364) >>>>> at >>>>> >>>>> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >>>>> at >>>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) >>>>> at >>>>> >>>>> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:292) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >>>>> at >>>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) >>>>> at >>>>> >>>>> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:266) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) >>>>> at >>>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>>>> at >>>>> >>>>> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59) >>>>> at >>>>> >>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402) >>>>> at >>>>> >>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400) >>>>> at >>>>> >>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406) >>>>> at >>>>> >>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406) >>>>> at >>>>> >>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360) >>>>> at >>>>> >>>>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360) >>>>> at >>>>> org.apache.spark.sql.SchemaRDD.getDependencies(SchemaRDD.scala:120) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:191) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:189) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.dependencies(RDD.scala:189) >>>>> at org.apache.spark.rdd.RDD.firstParent(RDD.scala:1236) >>>>> at >>>>> org.apache.spark.sql.SchemaRDD.getPartitions(SchemaRDD.scala:117) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) >>>>> at >>>>> org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) >>>>> at >>>>> >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) >>>>> at >>>>> org.apache.spark.SparkContext.runJob(SparkContext.scala:1135) >>>>> at org.apache.spark.rdd.RDD.collect(RDD.scala:774) >>>>> at >>>>> >>>>> org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:305) >>>>> at org.apache.spark.api.java.JavaRDD.collect(JavaRDD.scala:32) >>>>> at HiveContextExample.printRDD(HiveContextExample.java:77) >>>>> at HiveContextExample.main(HiveContextExample.java:71) >>>>> / >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-1-1-0-incompatible-with-Hive-tp17364.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>> >> >