Re: spark plugin with java

2015-12-02 Thread Josh Mahonin
It does. Under the hood, the DataFrame/RDD makes use of the PhoenixInputFormat, which derives the split information from the query planner and passes those back through to Spark to use for its parallelization. After you have the RDD / DataFrame handle, you're also free to use Spark's repartition()

Re: spark plugin with java

2015-12-02 Thread Krishna
Yes, I will create new tickets for any issues that I may run into. Another question: For now I'm pursuing the option of creating a dataframe as shown in my previous email. How does spark handle parallelization in this case? Does it use phoenix metadata on splits? On Wed, Dec 2, 2015 at 11:02 AM,

Re: spark plugin with java

2015-12-02 Thread Josh Mahonin
Hi Krishna, That's great to hear. You're right, the plugin itself should be backwards compatible to Spark 1.3.1 and should be for any version of Phoenix past 4.4.0, though I can't guarantee that to be the case forever. As well, I don't know how much usage there is across the board using the Java A

Re: spark plugin with java

2015-12-02 Thread Krishna
Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark plugin, is that accurate? For Spark 1.3.1, I created a dataframe as follows (could not use the plugin): *Map options = new HashMap();* *options.put("url", PhoenixRuntime.JDBC_PROTOCOL + PhoenixRuntime.JDBC_PROTOCO

Re: spark plugin with java

2015-12-01 Thread Josh Mahonin
Hi Krishna, I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame API should be unified between Scala and Java, so the following may work for you: DataFrame df = sqlContext.read() .format("org.apache.phoenix.spark") .option("table", "TABLE1") .option("zkUrl", "")

spark plugin with java

2015-12-01 Thread Krishna
Hi, Is there a working example for using spark plugin in Java? Specifically, what's the java equivalent for creating a dataframe as shown here in scala: val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", "COL1"), conf = configuration)