Re: phoenix-spark and pyspark

James Taylor Mon, 21 Dec 2015 08:34:16 -0800

Thanks for remembering about the docs, Josh.

On Mon, Dec 21, 2015 at 8:27 AM, Josh Mahonin <[email protected]> wrote:


> Just an update for anyone interested, PHOENIX-2503 was just committed for
> 4.7.0 and the docs have been updated to include these samples for PySpark
> users.
>
> https://phoenix.apache.org/phoenix_spark.html
>
> Josh
>
> On Thu, Dec 10, 2015 at 1:20 PM, Josh Mahonin <[email protected]> wrote:
>
>> Hey Nick,
>>
>> I think this used to work, and will again once PHOENIX-2503 gets
>> resolved. With the Spark DataFrame support, all the necessary glue is there
>> for Phoenix and pyspark to play nice. With that client JAR (or by
>> overriding the com.fasterxml.jackson JARS), you can do something like:
>>
>> df = sqlContext.read \
>>   .format("org.apache.phoenix.spark") \
>>   .option("table", "TABLE1") \
>>   .option("zkUrl", "localhost:63512") \
>>   .load()
>>
>> And
>>
>> df.write \
>>   .format("org.apache.phoenix.spark") \
>>   .mode("overwrite") \
>>   .option("table", "TABLE1") \
>>   .option("zkUrl", "localhost:63512") \
>>   .save()
>>
>>
>> Yes, this should be added to the documentation. I hadn't actually tried
>> this till just now. :)
>>
>> On Wed, Dec 9, 2015 at 6:39 PM, Nick Dimiduk <[email protected]> wrote:
>>
>>> Heya,
>>>
>>> Has anyone any experience using phoenix-spark integration from pyspark
>>> instead of scala? Folks prefer python around here...
>>>
>>> I did find this example [0] of using HBaseOutputFormat from pyspark,
>>> haven't tried extending it for phoenix. Maybe someone with more experience
>>> in pyspark knows better? Would be a great addition to our documentation.
>>>
>>> Thanks,
>>> Nick
>>>
>>> [0]:
>>> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_outputformat.py
>>>
>>
>>
>

Re: phoenix-spark and pyspark

Reply via email to