?Sorry forgot to attach traceback.

Regards


Rene Castberg

________________________________
Fra: Castberg, René Christian
Sendt: 13. mars 2015 07:13
Til: user@spark.apache.org
Kopi: gen tang
Emne: SV: Pyspark Hbase scan.


?Hi,


I have now successfully managed to test this in a local spark session.

But i am having a huge programming getting this to work with Horton Works 
technical preview.  I think that there is an incompatability with the way YARN 
has been compiled.


After changing the hbase version, and adding:

resolvers += "Hortonworks Releases" at 
"http://repo.hortonworks.com/content/repositories/releases/";


I get the attached traceback.


Any help in how to compile this jar such that it works would be greatly 
appreciated.


Regards


Rene Castberg


________________________________
Fra: gen tang <gen.tan...@gmail.com>
Sendt: 5. februar 2015 11:38
Til: Castberg, René Christian
Kopi: user@spark.apache.org
Emne: Re: Pyspark Hbase scan.

Hi,

In fact, this pull https://github.com/apache/spark/pull/3920 is to do Hbase 
scan. However, it is not merged yet.
You can also take a look at the example code at 
http://spark-packages.org/package/20 which is using scala and python to read 
data from hbase.

Hope this can be helpful.

Cheers
Gen



On Thu, Feb 5, 2015 at 11:11 AM, Castberg, René Christian 
<rene.castb...@dnvgl.com<mailto:rene.castb...@dnvgl.com>> wrote:
?Hi,

I am trying to do a hbase scan and read it into a spark rdd using pyspark. I 
have successfully written data to hbase from pyspark, and been able to read a 
full table from hbase using the python example code. Unfortunately I am unable 
to find any example code for doing an HBase scan and read it into a spark rdd 
from pyspark.

I have found a scala example :
http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark

But i can't find anything on how to do this from python. Can anybody shed some 
light on how (and if) this can be done??

Regards

Rene Castberg?


**************************************************************************************
This e-mail and any attachments thereto may contain confidential information 
and/or information protected by intellectual property rights for the exclusive 
attention of the intended addressees named above. If you have received this 
transmission in error, please immediately notify the sender by return e-mail 
and delete this message and its attachments. Unauthorized use, copying or 
further full or partial distribution of this e-mail or its contents is 
prohibited.
**************************************************************************************


**************************************************************************************
This e-mail and any attachments thereto may contain confidential information 
and/or information protected by intellectual property rights for the exclusive 
attention of the intended addressees named above. If you have received this 
transmission in error, please immediately notify the sender by return e-mail 
and delete this message and its attachments. Unauthorized use, copying or 
further full or partial distribution of this e-mail or its contents is 
prohibited.
**************************************************************************************
$ /hadoop-dist/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --driver-class-path 
/usr/hdp/current/share/lzo/0.6.0/lib/hadoop-lzo-0.6.0.jar:/home/recast/spark_hbase/target/scala-2.10/spark_hbase-assembly-1.0.jar
  --jars 
/hadoop-dist/spark-1.2.1-bin-hadoop2.4/lib/spark-examples-1.2.1-hadoop2.4.0.jar 
--driver-library-path 
/usr/hdp/current/share/lzo/0.6.0/lib/native/Linux-amd64-64/ 
AIS_count_msb_hbase.py  
Spark assembly has been built with Hive, including Datanucleus jars on classpath
2.7.9 (default, Feb 25 2015, 14:55:10) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]
/hadoop-dist/Python/lib/python2.7/site-packages/setuptools-12.3-py2.7.egg/pkg_resources/__init__.py:1224:
 UserWarning: /tmp/python-eggs is writable by group/others and vulnerable to 
attack when used with get_resource_filename. Consider a more secure location 
(set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
Reading config file for : smalldata01.hdp
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/recast/spark_hbase/target/scala-2.10/spark_hbase-assembly-1.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/hadoop-dist/spark-1.2.1-bin-hadoop2.4/lib/spark-assembly-1.2.1-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/03/13 06:10:34 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
15/03/13 06:10:34 WARN YarnClientSchedulerBackend: NOTE: SPARK_WORKER_INSTANCES 
is deprecated. Use SPARK_WORKER_INSTANCES or --num-executors through 
spark-submit instead.
15/03/13 06:10:34 WARN YarnClientSchedulerBackend: NOTE: SPARK_WORKER_CORES is 
deprecated. Use SPARK_EXECUTOR_CORES or --executor-cores through spark-submit 
instead.
15/03/13 06:10:34 WARN BlockReaderLocal: The short-circuit local reads feature 
cannot be used because libhadoop cannot be loaded.
15/03/13 06:11:27 WARN YarnClientClusterScheduler: Initial job has not accepted 
any resources; check your cluster UI to ensure that workers are registered and 
have sufficient memory
15/03/13 06:11:42 WARN YarnClientClusterScheduler: Initial job has not accepted 
any resources; check your cluster UI to ensure that workers are registered and 
have sufficient memory
15/03/13 06:11:47 WARN ReliableDeliverySupervisor: Association with remote 
system [akka.tcp://sparkyar...@smalldata13.hdp:48305] has failed, address is 
now gated for [5000] ms. Reason is: [Disassociated].
15/03/13 06:11:57 WARN YarnClientClusterScheduler: Initial job has not accepted 
any resources; check your cluster UI to ensure that workers are registered and 
have sufficient memory
15/03/13 06:12:12 WARN YarnClientClusterScheduler: Initial job has not accepted 
any resources; check your cluster UI to ensure that workers are registered and 
have sufficient memory
15/03/13 06:12:25 ERROR YarnClientSchedulerBackend: Yarn application has 
already exited with state FINISHED!
Traceback (most recent call last):
  File "/home/recast/AIS_Project/HBaseImport/AIS_count_msb_hbase.py", line 46, 
in <module>
    tables=sqlContext.sql('show tables').collect()
  File "/hadoop-dist/spark-1.2.1-bin-hadoop2.4/python/pyspark/sql.py", line 
1978, in collect
    bytesInJava = self._jschema_rdd.baseSchemaRDD().collectToPython().iterator()
  File 
"/hadoop-dist/spark-1.2.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
 line 538, in __call__
  File 
"/hadoop-dist/spark-1.2.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
 line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling 
o33.collectToPython.
: org.apache.spark.SparkException: Job cancelled because SparkContext was shut 
down
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
        at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
        at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundPostStop(DAGScheduler.scala:1375)
        at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
        at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
        at akka.actor.ActorCell.terminate(ActorCell.scala:369)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/recast/spark_hbase/target/scala-2.10/spark_hbase-assembly-1.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/hadoop-dist/spark-1.2.1-bin-hadoop2.4/lib/spark-assembly-1.2.1-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to