Re: phoenix-spark and pyspark

Josh Mahonin Thu, 10 Dec 2015 10:21:08 -0800

Hey Nick,

I think this used to work, and will again once PHOENIX-2503 gets resolved.
With the Spark DataFrame support, all the necessary glue is there for
Phoenix and pyspark to play nice. With that client JAR (or by overriding
the com.fasterxml.jackson JARS), you can do something like:


df = sqlContext.read \
  .format("org.apache.phoenix.spark") \
  .option("table", "TABLE1") \
  .option("zkUrl", "localhost:63512") \
  .load()

And

df.write \
  .format("org.apache.phoenix.spark") \
  .mode("overwrite") \
  .option("table", "TABLE1") \
  .option("zkUrl", "localhost:63512") \
  .save()


Yes, this should be added to the documentation. I hadn't actually tried
this till just now. :)

On Wed, Dec 9, 2015 at 6:39 PM, Nick Dimiduk <ndimi...@apache.org> wrote:

> Heya,
>
> Has anyone any experience using phoenix-spark integration from pyspark
> instead of scala? Folks prefer python around here...
>
> I did find this example [0] of using HBaseOutputFormat from pyspark,
> haven't tried extending it for phoenix. Maybe someone with more experience
> in pyspark knows better? Would be a great addition to our documentation.
>
> Thanks,
> Nick
>
> [0]:
> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_outputformat.py
>

Re: phoenix-spark and pyspark

Reply via email to