Just an update for anyone interested, PHOENIX-2503 was just committed for 4.7.0 and the docs have been updated to include these samples for PySpark users.
https://phoenix.apache.org/phoenix_spark.html Josh On Thu, Dec 10, 2015 at 1:20 PM, Josh Mahonin <[email protected]> wrote: > Hey Nick, > > I think this used to work, and will again once PHOENIX-2503 gets resolved. > With the Spark DataFrame support, all the necessary glue is there for > Phoenix and pyspark to play nice. With that client JAR (or by overriding > the com.fasterxml.jackson JARS), you can do something like: > > df = sqlContext.read \ > .format("org.apache.phoenix.spark") \ > .option("table", "TABLE1") \ > .option("zkUrl", "localhost:63512") \ > .load() > > And > > df.write \ > .format("org.apache.phoenix.spark") \ > .mode("overwrite") \ > .option("table", "TABLE1") \ > .option("zkUrl", "localhost:63512") \ > .save() > > > Yes, this should be added to the documentation. I hadn't actually tried > this till just now. :) > > On Wed, Dec 9, 2015 at 6:39 PM, Nick Dimiduk <[email protected]> wrote: > >> Heya, >> >> Has anyone any experience using phoenix-spark integration from pyspark >> instead of scala? Folks prefer python around here... >> >> I did find this example [0] of using HBaseOutputFormat from pyspark, >> haven't tried extending it for phoenix. Maybe someone with more experience >> in pyspark knows better? Would be a great addition to our documentation. >> >> Thanks, >> Nick >> >> [0]: >> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_outputformat.py >> > >
