Thanks for remembering about the docs, Josh. On Mon, Dec 21, 2015 at 8:27 AM, Josh Mahonin <[email protected]> wrote:
> Just an update for anyone interested, PHOENIX-2503 was just committed for > 4.7.0 and the docs have been updated to include these samples for PySpark > users. > > https://phoenix.apache.org/phoenix_spark.html > > Josh > > On Thu, Dec 10, 2015 at 1:20 PM, Josh Mahonin <[email protected]> wrote: > >> Hey Nick, >> >> I think this used to work, and will again once PHOENIX-2503 gets >> resolved. With the Spark DataFrame support, all the necessary glue is there >> for Phoenix and pyspark to play nice. With that client JAR (or by >> overriding the com.fasterxml.jackson JARS), you can do something like: >> >> df = sqlContext.read \ >> .format("org.apache.phoenix.spark") \ >> .option("table", "TABLE1") \ >> .option("zkUrl", "localhost:63512") \ >> .load() >> >> And >> >> df.write \ >> .format("org.apache.phoenix.spark") \ >> .mode("overwrite") \ >> .option("table", "TABLE1") \ >> .option("zkUrl", "localhost:63512") \ >> .save() >> >> >> Yes, this should be added to the documentation. I hadn't actually tried >> this till just now. :) >> >> On Wed, Dec 9, 2015 at 6:39 PM, Nick Dimiduk <[email protected]> wrote: >> >>> Heya, >>> >>> Has anyone any experience using phoenix-spark integration from pyspark >>> instead of scala? Folks prefer python around here... >>> >>> I did find this example [0] of using HBaseOutputFormat from pyspark, >>> haven't tried extending it for phoenix. Maybe someone with more experience >>> in pyspark knows better? Would be a great addition to our documentation. >>> >>> Thanks, >>> Nick >>> >>> [0]: >>> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_outputformat.py >>> >> >> >
