Yes, I will create new tickets for any issues that I may run into. Another question: For now I'm pursuing the option of creating a dataframe as shown in my previous email. How does spark handle parallelization in this case? Does it use phoenix metadata on splits?
On Wed, Dec 2, 2015 at 11:02 AM, Josh Mahonin <jmaho...@gmail.com> wrote: > Hi Krishna, > > That's great to hear. You're right, the plugin itself should be backwards > compatible to Spark 1.3.1 and should be for any version of Phoenix past > 4.4.0, though I can't guarantee that to be the case forever. As well, I > don't know how much usage there is across the board using the Java API and > DataFrames, you in fact may be the first. If you are encountering any > errors with it could you file a JIRA please with any stack traces you see? > > Since Spark is a very quickly changing project, often they update internal > functionality that we sometimes lag behind on support for, and as a result > there's no direct mapping between specific Phoenix versions and specific > Spark versions. We add new support as fast as we get patches, essentially. > > My general recommendation is to stay back a major version on Spark if > possible, but if you need to use the latest Spark releases, try use the > latest Phoenix release as well. The DataFrame support in Phoenix, for > instance, has had many patches and improvements recently that older > versions are missing. > > Thanks, > > Josh > > On Wed, Dec 2, 2015 at 1:40 PM, Krishna <research...@gmail.com> wrote: > >> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark >> plugin, is that accurate? >> >> For Spark 1.3.1, I created a dataframe as follows (could not use the >> plugin): >> * Map<String, String> options = new HashMap<String, String>();* >> * options.put("url", PhoenixRuntime.JDBC_PROTOCOL + >> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);* >> * options.put("dbtable", "TABLE_NAME");* >> >> * SQLContext sqlContext = new SQLContext(sc);* >> * DataFrame jdbcDF = sqlContext.load("jdbc", >> options).filter("COL_NAME > SOME_VALUE");* >> >> Also, it isn't immediately obvious which version of Spark was used in >> building Phoenix artifacts available on Maven. May be, it's worth putting >> it on the website. Let me know if the mapping below is incorrect. >> >> Phoenix 4.4.x <--> Spark 1.4.0 >> > Phoenix 4.5.x <--> Spark 1.5.0 >> > Phoenix 4.6.x <--> Spark 1.5.0 >> >> >> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jmaho...@gmail.com> wrote: >> >> > Hi Krishna, >> > >> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame >> API >> > should be unified between Scala and Java, so the following may work for >> you: >> > >> > DataFrame df = sqlContext.read() >> > .format("org.apache.phoenix.spark") >> > .option("table", "TABLE1") >> > .option("zkUrl", "<phoenix-server:2181>") >> > .load(); >> > >> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf' >> > parameter isn't supported. Please let us know back here if this works >> out >> > for you, I'd love to update the documentation and unit tests if it >> works. >> > >> > Josh >> > >> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <research...@gmail.com> wrote: >> > >> >> Hi, >> >> >> >> Is there a working example for using spark plugin in Java? >> Specifically, >> >> what's the java equivalent for creating a dataframe as shown here in >> scala: >> >> >> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", >> "COL1"), conf = configuration) >> >> >> >> >> > >> > >