Re: Spark-Phoenix Plugin

Josh Elser Mon, 06 Aug 2018 07:42:02 -0700

Besides the distribution and parallelism of Spark as a distributedexecution framework, I can't really see how phoenix-spark would befaster than the JDBC driver :). Phoenix-spark and the JDBC driver areusing the same code under the hood.

Phoenix-spark is using the PhoenixOutputFormat (and thus,PhoenixRecordWriter) to write data to Phoenix. Maybe look atPhoenixRecordWritable, too. These ultimately are executing UPSERTs on aPreparedStatement.

There is also the CsvBulkLoadTool which can create HFiles to bulk loaddata in Phoenix. I'm not sure if phoenix-spark has something wired upthat you can use to do this out of the box (certainly, you could do ityourself).


On 8/6/18 8:10 AM, Brandon Geise wrote:

Thanks for the reply Yun.
I’m not quite clear how this would exactly help on the upsert side? Areyou suggesting deriving the type from Phoenix then doing theencoding/decoding and writing/reading directly from HBase?
Thanks,

Brandon

*From: *Jaanai Zhang <cloud.pos...@gmail.com>
*Reply-To: *<user@phoenix.apache.org>
*Date: *Sunday, August 5, 2018 at 9:34 PM
*To: *<user@phoenix.apache.org>
*Subject: *Re: Spark-Phoenix Plugin
You can get data type from Phoenix meta, then encode/decode data towrite/read data. I think this way is effective, FYI :)
----------------------------------------

    Yun Zhang

    Best regards!
2018-08-04 21:43 GMT+08:00 Brandon Geise <brandonge...@gmail.com<mailto:brandonge...@gmail.com>>:
    Good morning,

    I’m looking at using a combination of Hbase, Phoenix and Spark for a
    project and read that using the Spark-Phoenix plugin directly is
    more efficient than JDBC, however it wasn’t entirely clear from
    examples when writing a dataframe if an upsert is performed and how
    much fine-grained options there are for executing the upsert.  Any
    information someone can share would be greatly appreciated!

    Thanks,

    Brandon

Re: Spark-Phoenix Plugin

Reply via email to