Just an update for anyone interested, PHOENIX-2503 was just committed for
4.7.0 and the docs have been updated to include these samples for PySpark
users.

https://phoenix.apache.org/phoenix_spark.html

Josh

On Thu, Dec 10, 2015 at 1:20 PM, Josh Mahonin <[email protected]> wrote:

> Hey Nick,
>
> I think this used to work, and will again once PHOENIX-2503 gets resolved.
> With the Spark DataFrame support, all the necessary glue is there for
> Phoenix and pyspark to play nice. With that client JAR (or by overriding
> the com.fasterxml.jackson JARS), you can do something like:
>
> df = sqlContext.read \
>   .format("org.apache.phoenix.spark") \
>   .option("table", "TABLE1") \
>   .option("zkUrl", "localhost:63512") \
>   .load()
>
> And
>
> df.write \
>   .format("org.apache.phoenix.spark") \
>   .mode("overwrite") \
>   .option("table", "TABLE1") \
>   .option("zkUrl", "localhost:63512") \
>   .save()
>
>
> Yes, this should be added to the documentation. I hadn't actually tried
> this till just now. :)
>
> On Wed, Dec 9, 2015 at 6:39 PM, Nick Dimiduk <[email protected]> wrote:
>
>> Heya,
>>
>> Has anyone any experience using phoenix-spark integration from pyspark
>> instead of scala? Folks prefer python around here...
>>
>> I did find this example [0] of using HBaseOutputFormat from pyspark,
>> haven't tried extending it for phoenix. Maybe someone with more experience
>> in pyspark knows better? Would be a great addition to our documentation.
>>
>> Thanks,
>> Nick
>>
>> [0]:
>> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_outputformat.py
>>
>
>

Reply via email to