Re: Hive From Spark

Marcelo Vanzin Thu, 21 Aug 2014 16:32:12 -0700

Hi Du,

I don't believe the Guava change has made it to the 1.1 branch. The
Guava doc says "hashInt" was added in 12.0, so what's probably
happening is that you have and old version of Guava in your classpath
before the Spark jars. (Hadoop ships with Guava 11, so that may be the
source of your problem.)


On Thu, Aug 21, 2014 at 4:23 PM, Du Li <l...@yahoo-inc.com.invalid> wrote:
> Hi,
>
> This guava dependency conflict problem should have been fixed as of
> yesterday according to https://issues.apache.org/jira/browse/SPARK-2420
>
> However, I just got java.lang.NoSuchMethodError:
> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
> by the following code snippet and “mvn3 test” on Mac. I built the latest
> version of spark (1.1.0-SNAPSHOT) and installed the jar files to the local
> maven repo. From my pom file I explicitly excluded guava from almost all
> possible dependencies, such as spark-hive_2.10-1.1.0.SNAPSHOT, and
> hadoop-client. This snippet is abstracted from a larger project. So the
> pom.xml includes many dependencies although not all are required by this
> snippet. The pom.xml is attached.
>
> Anybody knows what to fix it?
>
> Thanks,
> Du
> -------
>
> package com.myself.test
>
> import org.scalatest._
> import org.apache.hadoop.io.{NullWritable, BytesWritable}
> import org.apache.spark.{SparkContext, SparkConf}
> import org.apache.spark.SparkContext._
>
> class MyRecord(name: String) extends Serializable {
>   def getWritable(): BytesWritable = {
>     new
> BytesWritable(Option(name).getOrElse("\\N").toString.getBytes("UTF-8"))
>   }
>
>   final override def equals(that: Any): Boolean = {
>     if( !that.isInstanceOf[MyRecord] )
>       false
>     else {
>       val other = that.asInstanceOf[MyRecord]
>       this.getWritable == other.getWritable
>     }
>   }
> }
>
> class MyRecordTestSuite extends FunSuite {
>   // construct an MyRecord by Consumer.schema
>   val rec: MyRecord = new MyRecord("James Bond")
>
>   test("generated SequenceFile should be readable from spark") {
>     val path = "./testdata/"
>
>     val conf = new SparkConf(false).setMaster("local").setAppName("test data
> exchange with Hive")
>     conf.set("spark.driver.host", "localhost")
>     val sc = new SparkContext(conf)
>     val rdd = sc.makeRDD(Seq(rec))
>     rdd.map((x: MyRecord) => (NullWritable.get(), x.getWritable()))
>       .saveAsSequenceFile(path)
>
>     val bytes = sc.sequenceFile(path, classOf[NullWritable],
> classOf[BytesWritable]).first._2
>     assert(rec.getWritable() == bytes)
>
>     sc.stop()
>     System.clearProperty("spark.driver.port")
>   }
> }
>
>
> From: Andrew Lee <alee...@hotmail.com>
> Reply-To: "user@spark.apache.org" <user@spark.apache.org>
> Date: Monday, July 21, 2014 at 10:27 AM
> To: "user@spark.apache.org" <user@spark.apache.org>,
> "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
>
> Subject: RE: Hive From Spark
>
> Hi All,
>
> Currently, if you are running Spark HiveContext API with Hive 0.12, it won't
> work due to the following 2 libraries which are not consistent with Hive
> 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common
> practice, they should be consistent to work inter-operable).
>
> These are under discussion in the 2 JIRA tickets:
>
> https://issues.apache.org/jira/browse/HIVE-7387
>
> https://issues.apache.org/jira/browse/SPARK-2420
>
> When I ran the command by tweaking the classpath and build for Spark
> 1.0.1-rc3, I was able to create table through HiveContext, however, when I
> fetch the data, due to incompatible API calls in Guava, it breaks. This is
> critical since it needs to map the cllumns to the RDD schema.
>
> Hive and Hadoop are using an older version of guava libraries (11.0.1) where
> Spark Hive is using guava 14.0.1+.
> The community isn't willing to downgrade to 11.0.1 which is the current
> version for Hadoop 2.2 and Hive 0.12.
> Be aware of protobuf version as well in Hive 0.12 (it uses protobuf 2.4).
>
> scala>
>
> scala> import org.apache.spark.SparkContext
> import org.apache.spark.SparkContext
>
> scala> import org.apache.spark.sql.hive._
> import org.apache.spark.sql.hive._
>
> scala>
>
> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> hiveContext: org.apache.spark.sql.hive.HiveContext =
> org.apache.spark.sql.hive.HiveContext@34bee01a
>
> scala>
>
> scala> hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
> STRING)")
> res0: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[0] at RDD at SchemaRDD.scala:104
> == Query Plan ==
> <Native command: executed by Hive>
>
> scala> hiveContext.hql("LOAD DATA LOCAL INPATH
> 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> res1: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[3] at RDD at SchemaRDD.scala:104
> == Query Plan ==
> <Native command: executed by Hive>
>
> scala>
>
> scala> // Queries are expressed in HiveQL
>
> scala> hiveContext.hql("FROM src SELECT key,
> value").collect().foreach(println)
> java.lang.NoSuchMethodError:
> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
> at
> org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
> at
> org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
> at
> org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
> at
> org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
> at
> org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
> at
> org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
> at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
> at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75)
> at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92)
> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661)
> at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
> at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812)
> at org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52)
> at
> org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35)
> at
> org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29)
> at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776)
> at org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:60)
> at
> org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:70)
> at
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply(HiveStrategies.scala:73)
> at
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$4.apply(HiveStrategies.scala:73)
> at
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:280)
> at
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:69)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> at
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:316)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:316)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:319)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:319)
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:420)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
> at $iwC$$iwC$$iwC.<init>(<console>:28)
> at $iwC$$iwC.<init>(<console>:30)
> at $iwC.<init>(<console>:32)
> at <init>(<console>:34)
> at .<init>(<console>:38)
> at .<clinit>(<console>)
> at .<init>(<console>:7)
> at .<clinit>(<console>)
> at $print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
> at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
>> From: hao.ch...@intel.com
>> To: user@spark.apache.org; u...@spark.incubator.apache.org
>> Subject: RE: Hive From Spark
>> Date: Mon, 21 Jul 2014 01:14:19 +0000
>>
>> JiaJia, I've checkout the latest 1.0 branch, and then do the following
>> steps:
>> SPAKR_HIVE=true sbt/sbt clean assembly
>> cd examples
>> ../bin/run-example sql.hive.HiveFromSpark
>>
>> It works well in my local
>>
>> From your log output, it shows "Invalid method name: 'get_table', seems an
>> incompatible jar version or something wrong between the Hive metastore
>> service and client, can you double check the jar versions of Hive metastore
>> service or thrift?
>>
>>
>> -----Original Message-----
>> From: JiajiaJing [mailto:jj.jing0...@gmail.com]
>> Sent: Saturday, July 19, 2014 7:29 AM
>> To: u...@spark.incubator.apache.org
>> Subject: RE: Hive From Spark
>>
>> Hi Cheng Hao,
>>
>> Thank you very much for your reply.
>>
>> Basically, the program runs on Spark 1.0.0 and Hive 0.12.0 .
>>
>> Some setups of the environment are done by running "SPARK_HIVE=true
>> sbt/sbt assembly/assembly", including the jar in all the workers, and
>> copying the hive-site.xml to spark's conf dir.
>>
>> And then run the program as: " ./bin/run-example
>> org.apache.spark.examples.sql.hive.HiveFromSpark "
>>
>> It's good to know that this example runs well on your machine, could you
>> please give me some insight about your have done as well?
>>
>> Thank you very much!
>>
>> Jiajia
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-From-Spark-tp10110p10215.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Hive From Spark

Reply via email to