Re: HDP 2.5 - Python - Spark-On-Hbase
Ayan, Did you get to work the HBase Connection through Pyspark as well ? I have got the Spark - HBase connection working with Scala (via HBasecontext). However, but I eventually want to get this working within a Pyspark code - Would you have some suitable code snippets or approach so that I can call a Scala class within Pyspark ? Thanks, Debu On Wed, Jun 28, 2017 at 3:18 PM, ayan guha <guha.a...@gmail.com> wrote: > Hi > > Thanks for all of you, I could get HBase connector working. there are > still some details around namespace is pending, but overall it is working > well. > > Now, as usual, I would like to use the same concept into Structured > Streaming. Is there any similar way I can use writeStream.format and use > HBase writer? Or any other way to write continuous data to HBase? > > best > Ayan > > On Tue, Jun 27, 2017 at 2:15 AM, Weiqing Yang <yangweiqing...@gmail.com> > wrote: > >> For SHC documentation, please refer the README in SHC github, which is >> kept up-to-date. >> >> On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote: >> >>> Thanks all, I have found correct version of the package. Probably HDP >>> documentation is little behind. >>> >>> Best >>> Ayan >>> >>> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker < >>> mahesh_sawai...@persistent.com> wrote: >>> >>>> Ayan, >>>> >>>> The location of the logging class was moved from Spark 1.6 to Spark 2.0. >>>> >>>> Looks like you are trying to run 1.6 code on 2.0, I have ported some >>>> code like this before and if you have access to the code you can recompile >>>> it by changing reference to Logging class and directly use the slf4 Logger >>>> class, most of the code tends to be easily portable. >>>> >>>> >>>> >>>> Following is the release note for Spark 2.0 >>>> >>>> >>>> >>>> *Removals, Behavior Changes and Deprecations* >>>> >>>> *Removals* >>>> >>>> The following features have been removed in Spark 2.0: >>>> >>>>- Bagel >>>>- Support for Hadoop 2.1 and earlier >>>> - The ability to configure closure serializer >>>>- HTTPBroadcast >>>>- TTL-based metadata cleaning >>>>- *Semi-private class org.apache.spark.Logging. We suggest you use >>>>slf4j directly.* >>>>- SparkContext.metricsSystem >>>> >>>> Thanks, >>>> >>>> Mahesh >>>> >>>> >>>> >>>> >>>> >>>> *From:* ayan guha [mailto:guha.a...@gmail.com] >>>> *Sent:* Monday, June 26, 2017 6:26 AM >>>> *To:* Weiqing Yang >>>> *Cc:* user >>>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase >>>> >>>> >>>> >>>> Hi >>>> >>>> >>>> >>>> I am using following: >>>> >>>> >>>> >>>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories >>>> http://repo.hortonworks.com/content/groups/public/ >>>> >>>> >>>> >>>> Is it compatible with Spark 2.X? I would like to use it >>>> >>>> >>>> >>>> Best >>>> >>>> Ayan >>>> >>>> >>>> >>>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com> >>>> wrote: >>>> >>>> Yes. >>>> >>>> What SHC version you were using? >>>> >>>> If hitting any issues, you can post them in SHC github issues. There >>>> are some threads about this. >>>> >>>> >>>> >>>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote: >>>> >>>> Hi >>>> >>>> >>>> >>>> Is it possible to use SHC from Hortonworks with pyspark? If so, any >>>> working code sample available? >>>> >>>> >>>> >>>> Also, I faced an issue while running the samples with Spark 2.0 >>>> >>>> >>>> >>>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" >>>> >>>> >>>> >>>> Any workaround? >>>> >>>> >>>> >>>> Thanks in advance >>>> >>>> >>>> >>>> -- >>>> >>>> Best Regards, >>>> Ayan Guha >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Best Regards, >>>> Ayan Guha >>>> DISCLAIMER >>>> == >>>> This e-mail may contain privileged and confidential information which >>>> is the property of Persistent Systems Ltd. It is intended only for the use >>>> of the individual or entity to which it is addressed. If you are not the >>>> intended recipient, you are not authorized to read, retain, copy, print, >>>> distribute or use this message. If you have received this communication in >>>> error, please notify the sender and delete all copies of this message. >>>> Persistent Systems Ltd. does not accept any liability for virus infected >>>> mails. >>>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> > > > -- > Best Regards, > Ayan Guha >
Re: HDP 2.5 - Python - Spark-On-Hbase
Hi Thanks for all of you, I could get HBase connector working. there are still some details around namespace is pending, but overall it is working well. Now, as usual, I would like to use the same concept into Structured Streaming. Is there any similar way I can use writeStream.format and use HBase writer? Or any other way to write continuous data to HBase? best Ayan On Tue, Jun 27, 2017 at 2:15 AM, Weiqing Yang <yangweiqing...@gmail.com> wrote: > For SHC documentation, please refer the README in SHC github, which is > kept up-to-date. > > On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote: > >> Thanks all, I have found correct version of the package. Probably HDP >> documentation is little behind. >> >> Best >> Ayan >> >> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker < >> mahesh_sawai...@persistent.com> wrote: >> >>> Ayan, >>> >>> The location of the logging class was moved from Spark 1.6 to Spark 2.0. >>> >>> Looks like you are trying to run 1.6 code on 2.0, I have ported some >>> code like this before and if you have access to the code you can recompile >>> it by changing reference to Logging class and directly use the slf4 Logger >>> class, most of the code tends to be easily portable. >>> >>> >>> >>> Following is the release note for Spark 2.0 >>> >>> >>> >>> *Removals, Behavior Changes and Deprecations* >>> >>> *Removals* >>> >>> The following features have been removed in Spark 2.0: >>> >>>- Bagel >>>- Support for Hadoop 2.1 and earlier >>>- The ability to configure closure serializer >>>- HTTPBroadcast >>>- TTL-based metadata cleaning >>>- *Semi-private class org.apache.spark.Logging. We suggest you use >>>slf4j directly.* >>>- SparkContext.metricsSystem >>> >>> Thanks, >>> >>> Mahesh >>> >>> >>> >>> >>> >>> *From:* ayan guha [mailto:guha.a...@gmail.com] >>> *Sent:* Monday, June 26, 2017 6:26 AM >>> *To:* Weiqing Yang >>> *Cc:* user >>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase >>> >>> >>> >>> Hi >>> >>> >>> >>> I am using following: >>> >>> >>> >>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories >>> http://repo.hortonworks.com/content/groups/public/ >>> >>> >>> >>> Is it compatible with Spark 2.X? I would like to use it >>> >>> >>> >>> Best >>> >>> Ayan >>> >>> >>> >>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com> >>> wrote: >>> >>> Yes. >>> >>> What SHC version you were using? >>> >>> If hitting any issues, you can post them in SHC github issues. There are >>> some threads about this. >>> >>> >>> >>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote: >>> >>> Hi >>> >>> >>> >>> Is it possible to use SHC from Hortonworks with pyspark? If so, any >>> working code sample available? >>> >>> >>> >>> Also, I faced an issue while running the samples with Spark 2.0 >>> >>> >>> >>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" >>> >>> >>> >>> Any workaround? >>> >>> >>> >>> Thanks in advance >>> >>> >>> >>> -- >>> >>> Best Regards, >>> Ayan Guha >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Best Regards, >>> Ayan Guha >>> DISCLAIMER >>> == >>> This e-mail may contain privileged and confidential information which is >>> the property of Persistent Systems Ltd. It is intended only for the use of >>> the individual or entity to which it is addressed. If you are not the >>> intended recipient, you are not authorized to read, retain, copy, print, >>> distribute or use this message. If you have received this communication in >>> error, please notify the sender and delete all copies of this message. >>> Persistent Systems Ltd. does not accept any liability for virus infected >>> mails. >>> >> -- >> Best Regards, >> Ayan Guha >> > > -- Best Regards, Ayan Guha
Re: HDP 2.5 - Python - Spark-On-Hbase
For SHC documentation, please refer the README in SHC github, which is kept up-to-date. On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote: > Thanks all, I have found correct version of the package. Probably HDP > documentation is little behind. > > Best > Ayan > > On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker < > mahesh_sawai...@persistent.com> wrote: > >> Ayan, >> >> The location of the logging class was moved from Spark 1.6 to Spark 2.0. >> >> Looks like you are trying to run 1.6 code on 2.0, I have ported some code >> like this before and if you have access to the code you can recompile it by >> changing reference to Logging class and directly use the slf4 Logger class, >> most of the code tends to be easily portable. >> >> >> >> Following is the release note for Spark 2.0 >> >> >> >> *Removals, Behavior Changes and Deprecations* >> >> *Removals* >> >> The following features have been removed in Spark 2.0: >> >>- Bagel >>- Support for Hadoop 2.1 and earlier >>- The ability to configure closure serializer >>- HTTPBroadcast >>- TTL-based metadata cleaning >>- *Semi-private class org.apache.spark.Logging. We suggest you use >>slf4j directly.* >>- SparkContext.metricsSystem >> >> Thanks, >> >> Mahesh >> >> >> >> >> >> *From:* ayan guha [mailto:guha.a...@gmail.com] >> *Sent:* Monday, June 26, 2017 6:26 AM >> *To:* Weiqing Yang >> *Cc:* user >> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase >> >> >> >> Hi >> >> >> >> I am using following: >> >> >> >> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories >> http://repo.hortonworks.com/content/groups/public/ >> >> >> >> Is it compatible with Spark 2.X? I would like to use it >> >> >> >> Best >> >> Ayan >> >> >> >> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com> >> wrote: >> >> Yes. >> >> What SHC version you were using? >> >> If hitting any issues, you can post them in SHC github issues. There are >> some threads about this. >> >> >> >> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote: >> >> Hi >> >> >> >> Is it possible to use SHC from Hortonworks with pyspark? If so, any >> working code sample available? >> >> >> >> Also, I faced an issue while running the samples with Spark 2.0 >> >> >> >> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" >> >> >> >> Any workaround? >> >> >> >> Thanks in advance >> >> >> >> -- >> >> Best Regards, >> Ayan Guha >> >> >> >> >> >> >> >> -- >> >> Best Regards, >> Ayan Guha >> DISCLAIMER >> == >> This e-mail may contain privileged and confidential information which is >> the property of Persistent Systems Ltd. It is intended only for the use of >> the individual or entity to which it is addressed. If you are not the >> intended recipient, you are not authorized to read, retain, copy, print, >> distribute or use this message. If you have received this communication in >> error, please notify the sender and delete all copies of this message. >> Persistent Systems Ltd. does not accept any liability for virus infected >> mails. >> > -- > Best Regards, > Ayan Guha >
Re: HDP 2.5 - Python - Spark-On-Hbase
Thanks all, I have found correct version of the package. Probably HDP documentation is little behind. Best Ayan On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker < mahesh_sawai...@persistent.com> wrote: > Ayan, > > The location of the logging class was moved from Spark 1.6 to Spark 2.0. > > Looks like you are trying to run 1.6 code on 2.0, I have ported some code > like this before and if you have access to the code you can recompile it by > changing reference to Logging class and directly use the slf4 Logger class, > most of the code tends to be easily portable. > > > > Following is the release note for Spark 2.0 > > > > *Removals, Behavior Changes and Deprecations* > > *Removals* > > The following features have been removed in Spark 2.0: > >- Bagel >- Support for Hadoop 2.1 and earlier >- The ability to configure closure serializer >- HTTPBroadcast >- TTL-based metadata cleaning >- *Semi-private class org.apache.spark.Logging. We suggest you use >slf4j directly.* >- SparkContext.metricsSystem > > Thanks, > > Mahesh > > > > > > *From:* ayan guha [mailto:guha.a...@gmail.com] > *Sent:* Monday, June 26, 2017 6:26 AM > *To:* Weiqing Yang > *Cc:* user > *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase > > > > Hi > > > > I am using following: > > > > --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories > http://repo.hortonworks.com/content/groups/public/ > > > > Is it compatible with Spark 2.X? I would like to use it > > > > Best > > Ayan > > > > On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com> > wrote: > > Yes. > > What SHC version you were using? > > If hitting any issues, you can post them in SHC github issues. There are > some threads about this. > > > > On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote: > > Hi > > > > Is it possible to use SHC from Hortonworks with pyspark? If so, any > working code sample available? > > > > Also, I faced an issue while running the samples with Spark 2.0 > > > > "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" > > > > Any workaround? > > > > Thanks in advance > > > > -- > > Best Regards, > Ayan Guha > > > > > > > > -- > > Best Regards, > Ayan Guha > DISCLAIMER > == > This e-mail may contain privileged and confidential information which is > the property of Persistent Systems Ltd. It is intended only for the use of > the individual or entity to which it is addressed. If you are not the > intended recipient, you are not authorized to read, retain, copy, print, > distribute or use this message. If you have received this communication in > error, please notify the sender and delete all copies of this message. > Persistent Systems Ltd. does not accept any liability for virus infected > mails. > -- Best Regards, Ayan Guha
RE: HDP 2.5 - Python - Spark-On-Hbase
Ayan, The location of the logging class was moved from Spark 1.6 to Spark 2.0. Looks like you are trying to run 1.6 code on 2.0, I have ported some code like this before and if you have access to the code you can recompile it by changing reference to Logging class and directly use the slf4 Logger class, most of the code tends to be easily portable. Following is the release note for Spark 2.0 Removals, Behavior Changes and Deprecations Removals The following features have been removed in Spark 2.0: * Bagel * Support for Hadoop 2.1 and earlier * The ability to configure closure serializer * HTTPBroadcast * TTL-based metadata cleaning * Semi-private class org.apache.spark.Logging. We suggest you use slf4j directly. * SparkContext.metricsSystem Thanks, Mahesh From: ayan guha [mailto:guha.a...@gmail.com] Sent: Monday, June 26, 2017 6:26 AM To: Weiqing Yang Cc: user Subject: Re: HDP 2.5 - Python - Spark-On-Hbase Hi I am using following: --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ Is it compatible with Spark 2.X? I would like to use it Best Ayan On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com<mailto:yangweiqing...@gmail.com>> wrote: Yes. What SHC version you were using? If hitting any issues, you can post them in SHC github issues. There are some threads about this. On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>> wrote: Hi Is it possible to use SHC from Hortonworks with pyspark? If so, any working code sample available? Also, I faced an issue while running the samples with Spark 2.0 "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" Any workaround? Thanks in advance -- Best Regards, Ayan Guha -- Best Regards, Ayan Guha DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: HDP 2.5 - Python - Spark-On-Hbase
Hi I am using following: --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ Is it compatible with Spark 2.X? I would like to use it Best Ayan On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yangwrote: > Yes. > What SHC version you were using? > If hitting any issues, you can post them in SHC github issues. There are > some threads about this. > > On Fri, Jun 23, 2017 at 5:46 AM, ayan guha wrote: > >> Hi >> >> Is it possible to use SHC from Hortonworks with pyspark? If so, any >> working code sample available? >> >> Also, I faced an issue while running the samples with Spark 2.0 >> >> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" >> >> Any workaround? >> >> Thanks in advance >> >> -- >> Best Regards, >> Ayan Guha >> > > -- Best Regards, Ayan Guha
Re: HDP 2.5 - Python - Spark-On-Hbase
Yes. What SHC version you were using? If hitting any issues, you can post them in SHC github issues. There are some threads about this. On Fri, Jun 23, 2017 at 5:46 AM, ayan guhawrote: > Hi > > Is it possible to use SHC from Hortonworks with pyspark? If so, any > working code sample available? > > Also, I faced an issue while running the samples with Spark 2.0 > > "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" > > Any workaround? > > Thanks in advance > > -- > Best Regards, > Ayan Guha >
HDP 2.5 - Python - Spark-On-Hbase
Hi Is it possible to use SHC from Hortonworks with pyspark? If so, any working code sample available? Also, I faced an issue while running the samples with Spark 2.0 "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging" Any workaround? Thanks in advance -- Best Regards, Ayan Guha
Re: Python, Spark and HBase
I wanted to confirm whether this is now supported, such as in Spark v1.3.0 I've read varying info online just thought I'd verify. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p24117.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Python, Spark and HBase
Hi Tommer, I'm working on updating and improving the PR, and will work on getting an HBase example working with it. Will feed back as soon as I have had the chance to work on this a bit more. N On Thu, May 29, 2014 at 3:27 AM, twizansk twiza...@gmail.com wrote: The code which causes the error is: The code which causes the error is: sc = SparkContext(local, My App) rdd = sc.newAPIHadoopFile( name, 'org.apache.hadoop.hbase.mapreduce.TableInputFormat', 'org.apache.hadoop.hbase.io.ImmutableBytesWritable', 'org.apache.hadoop.hbase.client.Result', conf={hbase.zookeeper.quorum: my-host, hbase.rootdir: hdfs://my-host:8020/hbase, hbase.mapreduce.inputtable: data}) The full stack trace is: Py4JError Traceback (most recent call last) ipython-input-8-3b9a4ea2f659 in module() 7 conf={hbase.zookeeper.quorum: my-host, 8 hbase.rootdir: hdfs://my-host:8020/hbase, 9 hbase.mapreduce.inputtable: data}) 10 11 /opt/cloudera/parcels/CDH/lib/spark/python/pyspark/context.pyc in newAPIHadoopFile(self, name, inputformat_class, key_class, value_class, key_wrapper, value_wrapper, conf) 281 for k, v in conf.iteritems(): 282 jconf[k] = v -- 283 jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc, name, inputformat_class, key_class, value_class, 284 key_wrapper, value_wrapper, jconf) 285 return RDD(jrdd, self, PickleSerializer()) /opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py in __getattr__(self, name) 657 else: 658 raise Py4JError('{0} does not exist in the JVM'. -- 659 format(self._fqn + name)) 660 661 def __call__(self, *args): Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not exist in the JVM -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6507.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Python, Spark and HBase
Hi Nick, I finally got around to downloading and building the patch. I pulled the code from https://github.com/MLnick/spark-1/tree/pyspark-inputformats I am running on a CDH5 node. While the code in the CDH branch is different from spark master, I do believe that I have resolved any inconsistencies. When attempting to connect to an HBase table using SparkContext.newAPIHadoopFile I receive the following error: Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not exist in the JVM I have searched the pyspark-inputformats branch and cannot find any reference to the class org.apache.spark.api.python.PythonRDDnewAPIHadoopFile Any ideas? Also, do you have a working example of HBase access with the new code? Thanks Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6502.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Python, Spark and HBase
It sounds like you made a typo in the code — perhaps you’re trying to call self._jvm.PythonRDDnewAPIHadoopFile instead of self._jvm.PythonRDD.newAPIHadoopFile? There should be a dot before the new. Matei On May 28, 2014, at 5:25 PM, twizansk twiza...@gmail.com wrote: Hi Nick, I finally got around to downloading and building the patch. I pulled the code from https://github.com/MLnick/spark-1/tree/pyspark-inputformats I am running on a CDH5 node. While the code in the CDH branch is different from spark master, I do believe that I have resolved any inconsistencies. When attempting to connect to an HBase table using SparkContext.newAPIHadoopFile I receive the following error: Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not exist in the JVM I have searched the pyspark-inputformats branch and cannot find any reference to the class org.apache.spark.api.python.PythonRDDnewAPIHadoopFile Any ideas? Also, do you have a working example of HBase access with the new code? Thanks Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6502.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Python, Spark and HBase
In my code I am not referencing PythonRDD or PythonRDDnewAPIHadoopFile at all. I am calling SparkContext.newAPIHadoopFile with: inputformat_class='org.apache.hadoop.hbase.mapreduce.TableInputFormat' key_class='org.apache.hadoop.hbase.io.ImmutableBytesWritable', value_class='org.apache.hadoop.hbase.client.Result' Is it possible that the typo is coming from inside the spark code? Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6506.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Python, Spark and HBase
The code which causes the error is: The code which causes the error is: sc = SparkContext(local, My App) rdd = sc.newAPIHadoopFile( name, 'org.apache.hadoop.hbase.mapreduce.TableInputFormat', 'org.apache.hadoop.hbase.io.ImmutableBytesWritable', 'org.apache.hadoop.hbase.client.Result', conf={hbase.zookeeper.quorum: my-host, hbase.rootdir: hdfs://my-host:8020/hbase, hbase.mapreduce.inputtable: data}) The full stack trace is: Py4JError Traceback (most recent call last) ipython-input-8-3b9a4ea2f659 in module() 7 conf={hbase.zookeeper.quorum: my-host, 8 hbase.rootdir: hdfs://my-host:8020/hbase, 9 hbase.mapreduce.inputtable: data}) 10 11 /opt/cloudera/parcels/CDH/lib/spark/python/pyspark/context.pyc in newAPIHadoopFile(self, name, inputformat_class, key_class, value_class, key_wrapper, value_wrapper, conf) 281 for k, v in conf.iteritems(): 282 jconf[k] = v -- 283 jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc, name, inputformat_class, key_class, value_class, 284 key_wrapper, value_wrapper, jconf) 285 return RDD(jrdd, self, PickleSerializer()) /opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py in __getattr__(self, name) 657 else: 658 raise Py4JError('{0} does not exist in the JVM'. -- 659 format(self._fqn + name)) 660 661 def __call__(self, *args): Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not exist in the JVM -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6507.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Python, Spark and HBase
Thanks Nick and Matei. I'll take a look at the patch and keep you updated. Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6176.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Python, Spark and HBase
Hello, This seems like a basic question but I have been unable to find an answer in the archives or other online sources. I would like to know if there is any way to load a RDD from HBase in python. In Java/Scala I can do this by initializing a NewAPIHadoopRDD with a TableInputFormat class. Is there any equivalent in python? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Python, Spark and HBase
Unfortunately this is not yet possible. There’s a patch in progress posted here though: https://github.com/apache/spark/pull/455 — it would be great to get your feedback on it. Matei On May 20, 2014, at 4:21 PM, twizansk twiza...@gmail.com wrote: Hello, This seems like a basic question but I have been unable to find an answer in the archives or other online sources. I would like to know if there is any way to load a RDD from HBase in python. In Java/Scala I can do this by initializing a NewAPIHadoopRDD with a TableInputFormat class. Is there any equivalent in python? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142.html Sent from the Apache Spark User List mailing list archive at Nabble.com.