Ted, I did as you said, but it looks like that HBaseContext relies on some differences in HBase itself.
[ERROR] /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:30: error: object HFileWriterImpl is not a member of package org.apache.hadoop.hbase.io.hfile [ERROR] import org.apache.hadoop.hbase.io.hfile.{CacheConfig, HFileContextBuilder, HFileWriterImpl} [ERROR] ^ [ERROR] /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:627: error: not found: value HFileWriterImpl [ERROR] val hfileCompression = HFileWriterImpl [ERROR] ^ [ERROR] /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:750: error: not found: value HFileWriterImpl [ERROR] val defaultCompression = HFileWriterImpl [ERROR] ^ [ERROR] /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:898: error: value COMPARATOR is not a member of object org.apache.hadoop.hbase.CellComparator [ERROR] .withComparator(CellComparator.COMPARATOR).withFileContext(hFileContext) So… back to my original question… do you know when these incompatibilities were introduced? If so, I can pulled that version at time and try again. Thanks, Ben > On Mar 13, 2016, at 12:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > Benjamin: > Since hbase-spark is in its own module, you can pull the whole hbase-spark > subtree into hbase 1.0 root dir and add the following to root pom.xml: > <module>hbase-spark</module> > > Then you would be able to build the module yourself. > > hbase-spark module uses APIs which are compatible with hbase 1.0 > > Cheers > > On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Hi Ted, > > I see that you’re working on the hbase-spark module for hbase. I recently > packaged the SparkOnHBase project and gave it a test run. It works like a > charm on CDH 5.4 and 5.5. All I had to do was add > /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the > classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars > /path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the easy-to-use > HBaseContext for HBase operations. Now, I want to use the latest in > Dataframes. Since the new functionality is only in the hbase-spark module, I > want to know how to get it and package it for CDH 5.5, which still uses HBase > 1.0.0. Can you tell me what version of hbase master is still backwards > compatible? > > By the way, we are using Spark 1.6 if it matters. > > Thanks, > Ben > >> On Feb 10, 2016, at 2:34 AM, Ted Yu <yuzhih...@gmail.com >> <mailto:yuzhih...@gmail.com>> wrote: >> >> Have you tried adding hbase client jars to spark.executor.extraClassPath ? >> >> Cheers >> >> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <prabhujose.ga...@gmail.com >> <mailto:prabhujose.ga...@gmail.com>> wrote: >> + Spark-Dev >> >> For a Spark job on YARN accessing hbase table, added all hbase client jars >> into spark.yarn.dist.files, NodeManager when launching container i.e >> executor, does localization and brings all hbase-client jars into executor >> CWD, but still the executor tasks fail with ClassNotFoundException of hbase >> client jars, when i checked launch container.sh , Classpath does not have >> $PWD/* and hence all the hbase client jars are ignored. >> >> Is spark.yarn.dist.files not for adding jars into the executor classpath. >> >> Thanks, >> Prabhu Joseph >> >> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <prabhujose.ga...@gmail.com >> <mailto:prabhujose.ga...@gmail.com>> wrote: >> Hi All, >> >> When i do count on a Hbase table from Spark Shell which runs as yarn-client >> mode, the job fails at count(). >> >> MASTER=yarn-client ./spark-shell >> >> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, >> TableName} >> import org.apache.hadoop.hbase.client.HBaseAdmin >> import org.apache.hadoop.hbase.mapreduce.TableInputFormat >> >> val conf = HBaseConfiguration.create() >> conf.set(TableInputFormat.INPUT_TABLE,"spark") >> >> val hBaseRDD = sc.newAPIHadoopRDD(conf, >> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result]) >> hBaseRDD.count() >> >> >> Tasks throw below exception, the actual exception is swallowed, a bug >> JDK-7172206. After installing hbase client on all NodeManager machines, the >> Spark job ran fine. So I confirmed that the issue is with executor classpath. >> >> But i am searching for some other way of including hbase jars in spark >> executor classpath instead of installing hbase client on all NM machines. >> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that it >> localized all hbase jars, still the job fails. Tried >> spark.executor.extraClasspath, still the job fails. >> >> Is there any way we can access hbase from Executor without installing >> hbase-client on all machines. >> >> >> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, >> prabhuFS1): java.lang.IllegalStateException: unread block data >> at >> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> Thanks, >> Prabhu Joseph >> >> > >