It does. Under the hood, the DataFrame/RDD makes use of the
PhoenixInputFormat, which derives the split information from the query
planner and passes those back through to Spark to use for its
parallelization.
After you have the RDD / DataFrame handle, you're also free to use Spark's
repartition()
Yes, I will create new tickets for any issues that I may run into.
Another question: For now I'm pursuing the option of creating a dataframe
as shown in my previous email. How does spark handle parallelization in
this case? Does it use phoenix metadata on splits?
On Wed, Dec 2, 2015 at 11:02 AM,
Hi Krishna,
That's great to hear. You're right, the plugin itself should be backwards
compatible to Spark 1.3.1 and should be for any version of Phoenix past
4.4.0, though I can't guarantee that to be the case forever. As well, I
don't know how much usage there is across the board using the Java A
Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
plugin, is that accurate?
For Spark 1.3.1, I created a dataframe as follows (could not use the
plugin):
*Map options = new HashMap();*
*options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
PhoenixRuntime.JDBC_PROTOCO
Hi Krishna,
I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame API
should be unified between Scala and Java, so the following may work for you:
DataFrame df = sqlContext.read()
.format("org.apache.phoenix.spark")
.option("table", "TABLE1")
.option("zkUrl", "")
Hi,
Is there a working example for using spark plugin in Java? Specifically,
what's the java equivalent for creating a dataframe as shown here in scala:
val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
"COL1"), conf = configuration)