Never mind. I have resolved this issue by moving the local guava dependency forward.
Du On 8/22/14, 5:08 PM, "Du Li" <l...@yahoo-inc.com.INVALID> wrote: >I thought the fix had been pushed to the apache master ref. commit >"[SPARK-2848] Shade Guava in uber-jars" By Marcelo Vanzin on 8/20. So my >previous email was based on own build of the apache master, which turned >out not working yet. > >Marcelo: Please correct me if I got that commit wrong. > >Thanks, >Du > > > >On 8/22/14, 11:41 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote: > >>SPARK-2420 is fixed. I don't think it will be in 1.1, though - might >>be too risky at this point. >> >>I'm not familiar with spark-sql. >> >>On Fri, Aug 22, 2014 at 11:25 AM, Andrew Lee <alee...@hotmail.com> wrote: >>> Hopefully there could be some progress on SPARK-2420. It looks like >>>shading >>> may be the voted solution among downgrading. >>> >>> Any idea when this will happen? Could it happen in Spark 1.1.1 or Spark >>> 1.1.2? >>> >>> By the way, regarding bin/spark-sql? Is this more of a debugging tool >>>for >>> Spark job integrating with Hive? >>> How does people use spark-sql? I'm trying to understand the rationale >>>and >>> motivation behind this script, any idea? >>> >>> >>>> Date: Thu, 21 Aug 2014 16:31:08 -0700 >>> >>>> Subject: Re: Hive From Spark >>>> From: van...@cloudera.com >>>> To: l...@yahoo-inc.com.invalid >>>> CC: user@spark.apache.org; u...@spark.incubator.apache.org; >>>> pwend...@gmail.com >>> >>>> >>>> Hi Du, >>>> >>>> I don't believe the Guava change has made it to the 1.1 branch. The >>>> Guava doc says "hashInt" was added in 12.0, so what's probably >>>> happening is that you have and old version of Guava in your classpath >>>> before the Spark jars. (Hadoop ships with Guava 11, so that may be the >>>> source of your problem.) >>>> >>>> On Thu, Aug 21, 2014 at 4:23 PM, Du Li <l...@yahoo-inc.com.invalid> >>>>wrote: >>>> > Hi, >>>> > >>>> > This guava dependency conflict problem should have been fixed as of >>>> > yesterday according to >>>>https://issues.apache.org/jira/browse/SPARK-2420 >>>> > >>>> > However, I just got java.lang.NoSuchMethodError: >>>> > >>>> > >>>>com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/H >>>>a >>>>shCode; >>>> > by the following code snippet and ³mvn3 test² on Mac. I built the >>>>latest >>>> > version of spark (1.1.0-SNAPSHOT) and installed the jar files to the >>>> > local >>>> > maven repo. From my pom file I explicitly excluded guava from almost >>>>all >>>> > possible dependencies, such as spark-hive_2.10-1.1.0.SNAPSHOT, and >>>> > hadoop-client. This snippet is abstracted from a larger project. So >>>>the >>>> > pom.xml includes many dependencies although not all are required by >>>>this >>>> > snippet. The pom.xml is attached. >>>> > >>>> > Anybody knows what to fix it? >>>> > >>>> > Thanks, >>>> > Du >>>> > ------- >>>> > >>>> > package com.myself.test >>>> > >>>> > import org.scalatest._ >>>> > import org.apache.hadoop.io.{NullWritable, BytesWritable} >>>> > import org.apache.spark.{SparkContext, SparkConf} >>>> > import org.apache.spark.SparkContext._ >>>> > >>>> > class MyRecord(name: String) extends Serializable { >>>> > def getWritable(): BytesWritable = { >>>> > new >>>> > >>>>BytesWritable(Option(name).getOrElse("\\N").toString.getBytes("UTF-8")) >>>> > } >>>> > >>>> > final override def equals(that: Any): Boolean = { >>>> > if( !that.isInstanceOf[MyRecord] ) >>>> > false >>>> > else { >>>> > val other = that.asInstanceOf[MyRecord] >>>> > this.getWritable == other.getWritable >>>> > } >>>> > } >>>> > } >>>> > >>>> > class MyRecordTestSuite extends FunSuite { >>>> > // construct an MyRecord by Consumer.schema >>>> > val rec: MyRecord = new MyRecord("James Bond") >>>> > >>>> > test("generated SequenceFile should be readable from spark") { >>>> > val path = "./testdata/" >>>> > >>>> > val conf = new SparkConf(false).setMaster("local").setAppName("test >>>>data >>>> > exchange with Hive") >>>> > conf.set("spark.driver.host", "localhost") >>>> > val sc = new SparkContext(conf) >>>> > val rdd = sc.makeRDD(Seq(rec)) >>>> > rdd.map((x: MyRecord) => (NullWritable.get(), x.getWritable())) >>>> > .saveAsSequenceFile(path) >>>> > >>>> > val bytes = sc.sequenceFile(path, classOf[NullWritable], >>>> > classOf[BytesWritable]).first._2 >>>> > assert(rec.getWritable() == bytes) >>>> > >>>> > sc.stop() >>>> > System.clearProperty("spark.driver.port") >>>> > } >>>> > } >>>> > >>>> > >>>> > From: Andrew Lee <alee...@hotmail.com> >>>> > Reply-To: "user@spark.apache.org" <user@spark.apache.org> >>>> > Date: Monday, July 21, 2014 at 10:27 AM >>>> > To: "user@spark.apache.org" <user@spark.apache.org>, >>>> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org> >>>> > >>>> > Subject: RE: Hive From Spark >>>> > >>>> > Hi All, >>>> > >>>> > Currently, if you are running Spark HiveContext API with Hive 0.12, >>>>it >>>> > won't >>>> > work due to the following 2 libraries which are not consistent with >>>>Hive >>>> > 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as >>>>a >>>> > common >>>> > practice, they should be consistent to work inter-operable). >>>> > >>>> > These are under discussion in the 2 JIRA tickets: >>>> > >>>> > https://issues.apache.org/jira/browse/HIVE-7387 >>>> > >>>> > https://issues.apache.org/jira/browse/SPARK-2420 >>>> > >>>> > When I ran the command by tweaking the classpath and build for Spark >>>> > 1.0.1-rc3, I was able to create table through HiveContext, however, >>>>when >>>> > I >>>> > fetch the data, due to incompatible API calls in Guava, it breaks. >>>>This >>>> > is >>>> > critical since it needs to map the cllumns to the RDD schema. >>>> > >>>> > Hive and Hadoop are using an older version of guava libraries >>>>(11.0.1) >>>> > where >>>> > Spark Hive is using guava 14.0.1+. >>>> > The community isn't willing to downgrade to 11.0.1 which is the >>>>current >>>> > version for Hadoop 2.2 and Hive 0.12. >>>> > Be aware of protobuf version as well in Hive 0.12 (it uses protobuf >>>> > 2.4). >>>> > >>>> > scala> >>>> > >>>> > scala> import org.apache.spark.SparkContext >>>> > import org.apache.spark.SparkContext >>>> > >>>> > scala> import org.apache.spark.sql.hive._ >>>> > import org.apache.spark.sql.hive._ >>>> > >>>> > scala> >>>> > >>>> > scala> val hiveContext = new >>>>org.apache.spark.sql.hive.HiveContext(sc) >>>> > hiveContext: org.apache.spark.sql.hive.HiveContext = >>>> > org.apache.spark.sql.hive.HiveContext@34bee01a >>>> > >>>> > scala> >>>> > >>>> > scala> hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, >>>>value >>>> > STRING)") >>>> > res0: org.apache.spark.sql.SchemaRDD = >>>> > SchemaRDD[0] at RDD at SchemaRDD.scala:104 >>>> > == Query Plan == >>>> > <Native command: executed by Hive> >>>> > >>>> > scala> hiveContext.hql("LOAD DATA LOCAL INPATH >>>> > 'examples/src/main/resources/kv1.txt' INTO TABLE src") >>>> > res1: org.apache.spark.sql.SchemaRDD = >>>> > SchemaRDD[3] at RDD at SchemaRDD.scala:104 >>>> > == Query Plan == >>>> > <Native command: executed by Hive> >>>> > >>>> > scala> >>>> > >>>> > scala> // Queries are expressed in HiveQL >>>> > >>>> > scala> hiveContext.hql("FROM src SELECT key, >>>> > value").collect().foreach(println) >>>> > java.lang.NoSuchMethodError: >>>> > >>>> > >>>>com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/H >>>>a >>>>shCode; >>>> > at >>>> > >>>> > >>>>org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$coll >>>>e >>>>ction$OpenHashSet$$hashcode(OpenHashSet.scala:261) >>>> > at >>>> > >>>> > >>>>org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenH >>>>a >>>>shSet.scala:165) >>>> > at >>>> > >>>> > >>>>org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(Ope >>>>n >>>>HashSet.scala:102) >>>> > at >>>> > >>>> > >>>>org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp >>>>( >>>>SizeEstimator.scala:214) >>>> > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) >>>> > at >>>> > >>>>org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210 >>>>) >>>> > at >>>> > >>>> > >>>>org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.sc >>>>a >>>>la:169) >>>> > at >>>> > >>>> > >>>>org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimato >>>>r >>>>$$estimate(SizeEstimator.scala:161) >>>> > at >>>> > >>>>org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) >>>> > at >>>>org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) >>>> > at >>>>org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) >>>> > at >>>>org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) >>>> > at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) >>>> > at >>>> > >>>>org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) >>>> > at >>>> > >>>>org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52) >>>> > at >>>> > >>>> > >>>>org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadc >>>>a >>>>stFactory.scala:35) >>>> > at >>>> > >>>> > >>>>org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadc >>>>a >>>>stFactory.scala:29) >>>> > at >>>> > >>>> > >>>>org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManag >>>>e >>>>r.scala:62) >>>> > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) >>>> > at >>>> > >>>>org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:60 >>>>) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan. >>>>s >>>>cala:70) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.appl >>>>y >>>>(HiveStrategies.scala:73) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.appl >>>>y >>>>(HiveStrategies.scala:73) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLCont >>>>e >>>>xt.scala:280) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStra >>>>t >>>>egies.scala:69) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Qu >>>>e >>>>ryPlanner.scala:58) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(Qu >>>>e >>>>ryPlanner.scala:58) >>>> > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner. >>>>s >>>>cala:59) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQL >>>>C >>>>ontext.scala:316) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.sca >>>>l >>>>a:316) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute( >>>>S >>>>QLContext.scala:319) >>>> > at >>>> > >>>> > >>>>org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext. >>>>s >>>>cala:319) >>>> > at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:420) >>>> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) >>>> > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) >>>> > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:26) >>>> > at $iwC$$iwC$$iwC.<init>(<console>:28) >>>> > at $iwC$$iwC.<init>(<console>:30) >>>> > at $iwC.<init>(<console>:32) >>>> > at <init>(<console>:34) >>>> > at .<init>(<console>:38) >>>> > at .<clinit>(<console>) >>>> > at .<init>(<console>:7) >>>> > at .<clinit>(<console>) >>>> > at $print(<console>) >>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> > at >>>> > >>>> > >>>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja >>>>v >>>>a:57) >>>> > at >>>> > >>>> > >>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso >>>>r >>>>Impl.java:43) >>>> > at java.lang.reflect.Method.invoke(Method.java:606) >>>> > at >>>> > >>>>org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:78 >>>>8 >>>>) >>>> > at >>>> > >>>> > >>>>org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:10 >>>>5 >>>>6) >>>> > at >>>> > >>>>org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) >>>> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) >>>> > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) >>>> > at >>>> > >>>>org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796 >>>>) >>>> > at >>>> > >>>> > >>>>org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala >>>>: >>>>841) >>>> > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) >>>> > at >>>>org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) >>>> > at >>>>org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) >>>> > at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) >>>> > at >>>> > >>>> > >>>>org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkI >>>>L >>>>oop.scala:936) >>>> > at >>>> > >>>> > >>>>org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.sc >>>>a >>>>la:884) >>>> > at >>>> > >>>> > >>>>org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.sc >>>>a >>>>la:884) >>>> > at >>>> > >>>> > >>>>scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLo >>>>a >>>>der.scala:135) >>>> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) >>>> > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) >>>> > at org.apache.spark.repl.Main$.main(Main.scala:31) >>>> > at org.apache.spark.repl.Main.main(Main.scala) >>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> > at >>>> > >>>> > >>>>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja >>>>v >>>>a:57) >>>> > at >>>> > >>>> > >>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso >>>>r >>>>Impl.java:43) >>>> > at java.lang.reflect.Method.invoke(Method.java:606) >>>> > at >>>>org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) >>>> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) >>>> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> > >>>> > >>>> > >>>> >> From: hao.ch...@intel.com >>>> >> To: user@spark.apache.org; u...@spark.incubator.apache.org >>>> >> Subject: RE: Hive From Spark >>>> >> Date: Mon, 21 Jul 2014 01:14:19 +0000 >>>> >> >>>> >> JiaJia, I've checkout the latest 1.0 branch, and then do the >>>>following >>>> >> steps: >>>> >> SPAKR_HIVE=true sbt/sbt clean assembly >>>> >> cd examples >>>> >> ../bin/run-example sql.hive.HiveFromSpark >>>> >> >>>> >> It works well in my local >>>> >> >>>> >> From your log output, it shows "Invalid method name: 'get_table', >>>>seems >>>> >> an >>>> >> incompatible jar version or something wrong between the Hive >>>>metastore >>>> >> service and client, can you double check the jar versions of Hive >>>> >> metastore >>>> >> service or thrift? >>>> >> >>>> >> >>>> >> -----Original Message----- >>>> >> From: JiajiaJing [mailto:jj.jing0...@gmail.com] >>>> >> Sent: Saturday, July 19, 2014 7:29 AM >>>> >> To: u...@spark.incubator.apache.org >>>> >> Subject: RE: Hive From Spark >>>> >> >>>> >> Hi Cheng Hao, >>>> >> >>>> >> Thank you very much for your reply. >>>> >> >>>> >> Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 . >>>> >> >>>> >> Some setups of the environment are done by running "SPARK_HIVE=true >>>> >> sbt/sbt assembly/assembly", including the jar in all the workers, >>>>and >>>> >> copying the hive-site.xml to spark's conf dir. >>>> >> >>>> >> And then run the program as: " ./bin/run-example >>>> >> org.apache.spark.examples.sql.hive.HiveFromSpark " >>>> >> >>>> >> It's good to know that this example runs well on your machine, >>>>could >>>> >> you >>>> >> please give me some insight about your have done as well? >>>> >> >>>> >> Thank you very much! >>>> >> >>>> >> Jiajia >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> View this message in context: >>>> >> >>>> >> >>>>http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp1 >>>>0 >>>>110p10215.html >>>> >> Sent from the Apache Spark User List mailing list archive at >>>> >> Nabble.com. >>>> > >>>> > >>>> > >>>>--------------------------------------------------------------------- >>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> > For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>>> >>>> -- >>>> Marcelo >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >> >> >> >>-- >>Marcelo >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>For additional commands, e-mail: user-h...@spark.apache.org >> > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org