Re: HDP 2.5 - Python - Spark-On-Hbase

2017-09-30 Thread Debabrata Ghosh
Ayan,
   Did you get to work the HBase Connection through
Pyspark as well ? I have got the Spark - HBase connection working with
Scala (via HBasecontext). However, but I eventually want to get this
working within a Pyspark code - Would you have some suitable code snippets
or approach so that I can call a Scala class within Pyspark ?

Thanks,
Debu

On Wed, Jun 28, 2017 at 3:18 PM, ayan guha <guha.a...@gmail.com> wrote:

> Hi
>
> Thanks for all of you, I could get HBase connector working. there are
> still some details around namespace is pending, but overall it is working
> well.
>
> Now, as usual, I would like to use the same concept into Structured
> Streaming. Is there any similar way I can use writeStream.format and use
> HBase writer? Or any other way to write continuous data to HBase?
>
> best
> Ayan
>
> On Tue, Jun 27, 2017 at 2:15 AM, Weiqing Yang <yangweiqing...@gmail.com>
> wrote:
>
>> For SHC documentation, please refer the README in SHC github, which is
>> kept up-to-date.
>>
>> On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote:
>>
>>> Thanks all, I have found correct version of the package. Probably HDP
>>> documentation is little behind.
>>>
>>> Best
>>> Ayan
>>>
>>> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
>>> mahesh_sawai...@persistent.com> wrote:
>>>
>>>> Ayan,
>>>>
>>>> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>>>>
>>>> Looks like you are trying to run 1.6 code on 2.0, I have ported some
>>>> code like this before and if you have access to the code you can recompile
>>>> it by changing reference to Logging class and directly use the slf4 Logger
>>>> class, most of the code tends to be easily portable.
>>>>
>>>>
>>>>
>>>> Following is the release note for Spark 2.0
>>>>
>>>>
>>>>
>>>> *Removals, Behavior Changes and Deprecations*
>>>>
>>>> *Removals*
>>>>
>>>> The following features have been removed in Spark 2.0:
>>>>
>>>>- Bagel
>>>>- Support for Hadoop 2.1 and earlier
>>>>    - The ability to configure closure serializer
>>>>- HTTPBroadcast
>>>>- TTL-based metadata cleaning
>>>>- *Semi-private class org.apache.spark.Logging. We suggest you use
>>>>slf4j directly.*
>>>>- SparkContext.metricsSystem
>>>>
>>>> Thanks,
>>>>
>>>> Mahesh
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* ayan guha [mailto:guha.a...@gmail.com]
>>>> *Sent:* Monday, June 26, 2017 6:26 AM
>>>> *To:* Weiqing Yang
>>>> *Cc:* user
>>>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>>>>
>>>>
>>>>
>>>> Hi
>>>>
>>>>
>>>>
>>>> I am using following:
>>>>
>>>>
>>>>
>>>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
>>>> http://repo.hortonworks.com/content/groups/public/
>>>>
>>>>
>>>>
>>>> Is it compatible with Spark 2.X? I would like to use it
>>>>
>>>>
>>>>
>>>> Best
>>>>
>>>> Ayan
>>>>
>>>>
>>>>
>>>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com>
>>>> wrote:
>>>>
>>>> Yes.
>>>>
>>>> What SHC version you were using?
>>>>
>>>> If hitting any issues, you can post them in SHC github issues. There
>>>> are some threads about this.
>>>>
>>>>
>>>>
>>>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>
>>>>
>>>>
>>>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>>>> working code sample available?
>>>>
>>>>
>>>>
>>>> Also, I faced an issue while running the samples with Spark 2.0
>>>>
>>>>
>>>>
>>>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>>>
>>>>
>>>>
>>>> Any workaround?
>>>>
>>>>
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>> Ayan Guha
>>>> DISCLAIMER
>>>> ==
>>>> This e-mail may contain privileged and confidential information which
>>>> is the property of Persistent Systems Ltd. It is intended only for the use
>>>> of the individual or entity to which it is addressed. If you are not the
>>>> intended recipient, you are not authorized to read, retain, copy, print,
>>>> distribute or use this message. If you have received this communication in
>>>> error, please notify the sender and delete all copies of this message.
>>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>>> mails.
>>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-28 Thread ayan guha
Hi

Thanks for all of you, I could get HBase connector working. there are still
some details around namespace is pending, but overall it is working well.

Now, as usual, I would like to use the same concept into Structured
Streaming. Is there any similar way I can use writeStream.format and use
HBase writer? Or any other way to write continuous data to HBase?

best
Ayan

On Tue, Jun 27, 2017 at 2:15 AM, Weiqing Yang <yangweiqing...@gmail.com>
wrote:

> For SHC documentation, please refer the README in SHC github, which is
> kept up-to-date.
>
> On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Thanks all, I have found correct version of the package. Probably HDP
>> documentation is little behind.
>>
>> Best
>> Ayan
>>
>> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
>> mahesh_sawai...@persistent.com> wrote:
>>
>>> Ayan,
>>>
>>> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>>>
>>> Looks like you are trying to run 1.6 code on 2.0, I have ported some
>>> code like this before and if you have access to the code you can recompile
>>> it by changing reference to Logging class and directly use the slf4 Logger
>>> class, most of the code tends to be easily portable.
>>>
>>>
>>>
>>> Following is the release note for Spark 2.0
>>>
>>>
>>>
>>> *Removals, Behavior Changes and Deprecations*
>>>
>>> *Removals*
>>>
>>> The following features have been removed in Spark 2.0:
>>>
>>>- Bagel
>>>- Support for Hadoop 2.1 and earlier
>>>- The ability to configure closure serializer
>>>- HTTPBroadcast
>>>- TTL-based metadata cleaning
>>>- *Semi-private class org.apache.spark.Logging. We suggest you use
>>>slf4j directly.*
>>>- SparkContext.metricsSystem
>>>
>>> Thanks,
>>>
>>> Mahesh
>>>
>>>
>>>
>>>
>>>
>>> *From:* ayan guha [mailto:guha.a...@gmail.com]
>>> *Sent:* Monday, June 26, 2017 6:26 AM
>>> *To:* Weiqing Yang
>>> *Cc:* user
>>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>>>
>>>
>>>
>>> Hi
>>>
>>>
>>>
>>> I am using following:
>>>
>>>
>>>
>>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
>>> http://repo.hortonworks.com/content/groups/public/
>>>
>>>
>>>
>>> Is it compatible with Spark 2.X? I would like to use it
>>>
>>>
>>>
>>> Best
>>>
>>> Ayan
>>>
>>>
>>>
>>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com>
>>> wrote:
>>>
>>> Yes.
>>>
>>> What SHC version you were using?
>>>
>>> If hitting any issues, you can post them in SHC github issues. There are
>>> some threads about this.
>>>
>>>
>>>
>>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>> Hi
>>>
>>>
>>>
>>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>>> working code sample available?
>>>
>>>
>>>
>>> Also, I faced an issue while running the samples with Spark 2.0
>>>
>>>
>>>
>>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>>
>>>
>>>
>>> Any workaround?
>>>
>>>
>>>
>>> Thanks in advance
>>>
>>>
>>>
>>> --
>>>
>>> Best Regards,
>>> Ayan Guha
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Best Regards,
>>> Ayan Guha
>>> DISCLAIMER
>>> ==
>>> This e-mail may contain privileged and confidential information which is
>>> the property of Persistent Systems Ltd. It is intended only for the use of
>>> the individual or entity to which it is addressed. If you are not the
>>> intended recipient, you are not authorized to read, retain, copy, print,
>>> distribute or use this message. If you have received this communication in
>>> error, please notify the sender and delete all copies of this message.
>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>> mails.
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha


Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-26 Thread Weiqing Yang
For SHC documentation, please refer the README in SHC github, which is kept
up-to-date.

On Mon, Jun 26, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote:

> Thanks all, I have found correct version of the package. Probably HDP
> documentation is little behind.
>
> Best
> Ayan
>
> On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
> mahesh_sawai...@persistent.com> wrote:
>
>> Ayan,
>>
>> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>>
>> Looks like you are trying to run 1.6 code on 2.0, I have ported some code
>> like this before and if you have access to the code you can recompile it by
>> changing reference to Logging class and directly use the slf4 Logger class,
>> most of the code tends to be easily portable.
>>
>>
>>
>> Following is the release note for Spark 2.0
>>
>>
>>
>> *Removals, Behavior Changes and Deprecations*
>>
>> *Removals*
>>
>> The following features have been removed in Spark 2.0:
>>
>>- Bagel
>>- Support for Hadoop 2.1 and earlier
>>- The ability to configure closure serializer
>>- HTTPBroadcast
>>- TTL-based metadata cleaning
>>- *Semi-private class org.apache.spark.Logging. We suggest you use
>>slf4j directly.*
>>- SparkContext.metricsSystem
>>
>> Thanks,
>>
>> Mahesh
>>
>>
>>
>>
>>
>> *From:* ayan guha [mailto:guha.a...@gmail.com]
>> *Sent:* Monday, June 26, 2017 6:26 AM
>> *To:* Weiqing Yang
>> *Cc:* user
>> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>>
>>
>>
>> Hi
>>
>>
>>
>> I am using following:
>>
>>
>>
>> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
>> http://repo.hortonworks.com/content/groups/public/
>>
>>
>>
>> Is it compatible with Spark 2.X? I would like to use it
>>
>>
>>
>> Best
>>
>> Ayan
>>
>>
>>
>> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com>
>> wrote:
>>
>> Yes.
>>
>> What SHC version you were using?
>>
>> If hitting any issues, you can post them in SHC github issues. There are
>> some threads about this.
>>
>>
>>
>> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote:
>>
>> Hi
>>
>>
>>
>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>> working code sample available?
>>
>>
>>
>> Also, I faced an issue while running the samples with Spark 2.0
>>
>>
>>
>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>
>>
>>
>> Any workaround?
>>
>>
>>
>> Thanks in advance
>>
>>
>>
>> --
>>
>> Best Regards,
>> Ayan Guha
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best Regards,
>> Ayan Guha
>> DISCLAIMER
>> ==
>> This e-mail may contain privileged and confidential information which is
>> the property of Persistent Systems Ltd. It is intended only for the use of
>> the individual or entity to which it is addressed. If you are not the
>> intended recipient, you are not authorized to read, retain, copy, print,
>> distribute or use this message. If you have received this communication in
>> error, please notify the sender and delete all copies of this message.
>> Persistent Systems Ltd. does not accept any liability for virus infected
>> mails.
>>
> --
> Best Regards,
> Ayan Guha
>


Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-26 Thread ayan guha
Thanks all, I have found correct version of the package. Probably HDP
documentation is little behind.

Best
Ayan

On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker <
mahesh_sawai...@persistent.com> wrote:

> Ayan,
>
> The location of the logging class was moved from Spark 1.6 to Spark 2.0.
>
> Looks like you are trying to run 1.6 code on 2.0, I have ported some code
> like this before and if you have access to the code you can recompile it by
> changing reference to Logging class and directly use the slf4 Logger class,
> most of the code tends to be easily portable.
>
>
>
> Following is the release note for Spark 2.0
>
>
>
> *Removals, Behavior Changes and Deprecations*
>
> *Removals*
>
> The following features have been removed in Spark 2.0:
>
>- Bagel
>- Support for Hadoop 2.1 and earlier
>- The ability to configure closure serializer
>- HTTPBroadcast
>- TTL-based metadata cleaning
>- *Semi-private class org.apache.spark.Logging. We suggest you use
>slf4j directly.*
>- SparkContext.metricsSystem
>
> Thanks,
>
> Mahesh
>
>
>
>
>
> *From:* ayan guha [mailto:guha.a...@gmail.com]
> *Sent:* Monday, June 26, 2017 6:26 AM
> *To:* Weiqing Yang
> *Cc:* user
> *Subject:* Re: HDP 2.5 - Python - Spark-On-Hbase
>
>
>
> Hi
>
>
>
> I am using following:
>
>
>
> --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
> http://repo.hortonworks.com/content/groups/public/
>
>
>
> Is it compatible with Spark 2.X? I would like to use it
>
>
>
> Best
>
> Ayan
>
>
>
> On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang <yangweiqing...@gmail.com>
> wrote:
>
> Yes.
>
> What SHC version you were using?
>
> If hitting any issues, you can post them in SHC github issues. There are
> some threads about this.
>
>
>
> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha <guha.a...@gmail.com> wrote:
>
> Hi
>
>
>
> Is it possible to use SHC from Hortonworks with pyspark? If so, any
> working code sample available?
>
>
>
> Also, I faced an issue while running the samples with Spark 2.0
>
>
>
> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>
>
>
> Any workaround?
>
>
>
> Thanks in advance
>
>
>
> --
>
> Best Regards,
> Ayan Guha
>
>
>
>
>
>
>
> --
>
> Best Regards,
> Ayan Guha
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>
-- 
Best Regards,
Ayan Guha


RE: HDP 2.5 - Python - Spark-On-Hbase

2017-06-25 Thread Mahesh Sawaiker
Ayan,
The location of the logging class was moved from Spark 1.6 to Spark 2.0.
Looks like you are trying to run 1.6 code on 2.0, I have ported some code like 
this before and if you have access to the code you can recompile it by changing 
reference to Logging class and directly use the slf4 Logger class, most of the 
code tends to be easily portable.

Following is the release note for Spark 2.0

Removals, Behavior Changes and Deprecations
Removals
The following features have been removed in Spark 2.0:

  *   Bagel
  *   Support for Hadoop 2.1 and earlier
  *   The ability to configure closure serializer
  *   HTTPBroadcast
  *   TTL-based metadata cleaning
  *   Semi-private class org.apache.spark.Logging. We suggest you use slf4j 
directly.
  *   SparkContext.metricsSystem
Thanks,
Mahesh


From: ayan guha [mailto:guha.a...@gmail.com]
Sent: Monday, June 26, 2017 6:26 AM
To: Weiqing Yang
Cc: user
Subject: Re: HDP 2.5 - Python - Spark-On-Hbase

Hi

I am using following:

--packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories 
http://repo.hortonworks.com/content/groups/public/

Is it compatible with Spark 2.X? I would like to use it

Best
Ayan

On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang 
<yangweiqing...@gmail.com<mailto:yangweiqing...@gmail.com>> wrote:
Yes.
What SHC version you were using?
If hitting any issues, you can post them in SHC github issues. There are some 
threads about this.

On Fri, Jun 23, 2017 at 5:46 AM, ayan guha 
<guha.a...@gmail.com<mailto:guha.a...@gmail.com>> wrote:
Hi

Is it possible to use SHC from Hortonworks with pyspark? If so, any working 
code sample available?

Also, I faced an issue while running the samples with Spark 2.0

"Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"

Any workaround?

Thanks in advance

--
Best Regards,
Ayan Guha




--
Best Regards,
Ayan Guha
DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-25 Thread ayan guha
Hi

I am using following:

--packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories
http://repo.hortonworks.com/content/groups/public/

Is it compatible with Spark 2.X? I would like to use it

Best
Ayan

On Sat, Jun 24, 2017 at 2:09 AM, Weiqing Yang 
wrote:

> Yes.
> What SHC version you were using?
> If hitting any issues, you can post them in SHC github issues. There are
> some threads about this.
>
> On Fri, Jun 23, 2017 at 5:46 AM, ayan guha  wrote:
>
>> Hi
>>
>> Is it possible to use SHC from Hortonworks with pyspark? If so, any
>> working code sample available?
>>
>> Also, I faced an issue while running the samples with Spark 2.0
>>
>> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>>
>> Any workaround?
>>
>> Thanks in advance
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha


Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-23 Thread Weiqing Yang
Yes.
What SHC version you were using?
If hitting any issues, you can post them in SHC github issues. There are
some threads about this.

On Fri, Jun 23, 2017 at 5:46 AM, ayan guha  wrote:

> Hi
>
> Is it possible to use SHC from Hortonworks with pyspark? If so, any
> working code sample available?
>
> Also, I faced an issue while running the samples with Spark 2.0
>
> "Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"
>
> Any workaround?
>
> Thanks in advance
>
> --
> Best Regards,
> Ayan Guha
>


HDP 2.5 - Python - Spark-On-Hbase

2017-06-23 Thread ayan guha
Hi

Is it possible to use SHC from Hortonworks with pyspark? If so, any working
code sample available?

Also, I faced an issue while running the samples with Spark 2.0

"Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging"

Any workaround?

Thanks in advance

-- 
Best Regards,
Ayan Guha


Re: Python, Spark and HBase

2015-08-03 Thread ericbless
I wanted to confirm whether this is now supported, such as in Spark v1.3.0

I've read varying info online  just thought I'd verify.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p24117.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Python, Spark and HBase

2014-05-29 Thread Nick Pentreath
Hi Tommer,

I'm working on updating and improving the PR, and will work on getting an
HBase example working with it. Will feed back as soon as I have had the
chance to work on this a bit more.

N


On Thu, May 29, 2014 at 3:27 AM, twizansk twiza...@gmail.com wrote:

 The code which causes the error is:

 The code which causes the error is:

 sc = SparkContext(local, My App)
 rdd = sc.newAPIHadoopFile(
 name,
 'org.apache.hadoop.hbase.mapreduce.TableInputFormat',
 'org.apache.hadoop.hbase.io.ImmutableBytesWritable',
 'org.apache.hadoop.hbase.client.Result',
 conf={hbase.zookeeper.quorum: my-host,
   hbase.rootdir: hdfs://my-host:8020/hbase,
   hbase.mapreduce.inputtable: data})

 The full stack trace is:



 Py4JError Traceback (most recent call last)
 ipython-input-8-3b9a4ea2f659 in module()
   7 conf={hbase.zookeeper.quorum: my-host,
   8   hbase.rootdir: hdfs://my-host:8020/hbase,
  9   hbase.mapreduce.inputtable: data})
  10
  11

 /opt/cloudera/parcels/CDH/lib/spark/python/pyspark/context.pyc in
 newAPIHadoopFile(self, name, inputformat_class, key_class, value_class,
 key_wrapper, value_wrapper, conf)
 281 for k, v in conf.iteritems():
 282 jconf[k] = v
 -- 283 jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc,
 name,
 inputformat_class, key_class, value_class,
 284 key_wrapper,
 value_wrapper, jconf)
 285 return RDD(jrdd, self, PickleSerializer())


 /opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py
 in __getattr__(self, name)
 657 else:
 658 raise Py4JError('{0} does not exist in the JVM'.
 -- 659 format(self._fqn + name))
 660
 661 def __call__(self, *args):

 Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not
 exist in the JVM



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6507.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Python, Spark and HBase

2014-05-28 Thread twizansk
Hi Nick,

I finally got around to downloading and building the patch.  

I pulled the code from
https://github.com/MLnick/spark-1/tree/pyspark-inputformats

I am running on a CDH5 node.  While the code in the CDH branch is different
from spark master, I do believe that I have resolved any inconsistencies.

When attempting to connect to an HBase table using
SparkContext.newAPIHadoopFile  I receive the following error:

Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile
does not exist in the JVM

I have searched the pyspark-inputformats branch and cannot find any
reference to the class org.apache.spark.api.python.PythonRDDnewAPIHadoopFile

Any ideas?

Also, do you have a working example of HBase access with the new code?

Thanks

Tommer  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6502.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Python, Spark and HBase

2014-05-28 Thread Matei Zaharia
It sounds like you made a typo in the code — perhaps you’re trying to call 
self._jvm.PythonRDDnewAPIHadoopFile instead of  
self._jvm.PythonRDD.newAPIHadoopFile? There should be a dot before the new.


Matei

On May 28, 2014, at 5:25 PM, twizansk twiza...@gmail.com wrote:

 Hi Nick,
 
 I finally got around to downloading and building the patch.  
 
 I pulled the code from
 https://github.com/MLnick/spark-1/tree/pyspark-inputformats
 
 I am running on a CDH5 node.  While the code in the CDH branch is different
 from spark master, I do believe that I have resolved any inconsistencies.
 
 When attempting to connect to an HBase table using
 SparkContext.newAPIHadoopFile  I receive the following error:
 
Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile
 does not exist in the JVM
 
 I have searched the pyspark-inputformats branch and cannot find any
 reference to the class org.apache.spark.api.python.PythonRDDnewAPIHadoopFile
 
 Any ideas?
 
 Also, do you have a working example of HBase access with the new code?
 
 Thanks
 
 Tommer  
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6502.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Python, Spark and HBase

2014-05-28 Thread twizansk
In my code I am not referencing PythonRDD or PythonRDDnewAPIHadoopFile at
all.  I am calling SparkContext.newAPIHadoopFile with: 

inputformat_class='org.apache.hadoop.hbase.mapreduce.TableInputFormat'
key_class='org.apache.hadoop.hbase.io.ImmutableBytesWritable',
value_class='org.apache.hadoop.hbase.client.Result'

Is it possible that the typo is coming from inside the spark code?

Tommer



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6506.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Python, Spark and HBase

2014-05-28 Thread twizansk
The code which causes the error is:

The code which causes the error is:

sc = SparkContext(local, My App)
rdd = sc.newAPIHadoopFile(
name, 
'org.apache.hadoop.hbase.mapreduce.TableInputFormat',
'org.apache.hadoop.hbase.io.ImmutableBytesWritable',
'org.apache.hadoop.hbase.client.Result',
conf={hbase.zookeeper.quorum: my-host, 
  hbase.rootdir: hdfs://my-host:8020/hbase,
  hbase.mapreduce.inputtable: data})

The full stack trace is:



Py4JError Traceback (most recent call last)
ipython-input-8-3b9a4ea2f659 in module()
  7 conf={hbase.zookeeper.quorum: my-host, 
  8   hbase.rootdir: hdfs://my-host:8020/hbase,
 9   hbase.mapreduce.inputtable: data})
 10 
 11 

/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/context.pyc in
newAPIHadoopFile(self, name, inputformat_class, key_class, value_class,
key_wrapper, value_wrapper, conf)
281 for k, v in conf.iteritems():
282 jconf[k] = v
-- 283 jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc, name,
inputformat_class, key_class, value_class,
284 key_wrapper,
value_wrapper, jconf)
285 return RDD(jrdd, self, PickleSerializer())

/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py
in __getattr__(self, name)
657 else:
658 raise Py4JError('{0} does not exist in the JVM'.
-- 659 format(self._fqn + name))
660 
661 def __call__(self, *args):

Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not
exist in the JVM



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6507.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Python, Spark and HBase

2014-05-21 Thread twizansk
Thanks Nick and Matei.   I'll take a look at the patch and keep you updated.

Tommer



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6176.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Python, Spark and HBase

2014-05-20 Thread twizansk
Hello,

This seems like a basic question but I have been unable to find an answer in
the archives or other online sources.

I would like to know if there is any way to load a RDD from HBase in python. 
In Java/Scala I can do this by initializing a NewAPIHadoopRDD with a
TableInputFormat class.  Is there any equivalent in python?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Python, Spark and HBase

2014-05-20 Thread Matei Zaharia
Unfortunately this is not yet possible. There’s a patch in progress posted here 
though: https://github.com/apache/spark/pull/455 — it would be great to get 
your feedback on it.

Matei

On May 20, 2014, at 4:21 PM, twizansk twiza...@gmail.com wrote:

 Hello,
 
 This seems like a basic question but I have been unable to find an answer in
 the archives or other online sources.
 
 I would like to know if there is any way to load a RDD from HBase in python. 
 In Java/Scala I can do this by initializing a NewAPIHadoopRDD with a
 TableInputFormat class.  Is there any equivalent in python?
 
 Thanks
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.