Re: spark plugin with java

Krishna Wed, 02 Dec 2015 11:56:30 -0800

Yes, I will create new tickets for any issues that I may run into.
Another question: For now I'm pursuing the option of creating a dataframe
as shown in my previous email. How does spark handle parallelization in
this case? Does it use phoenix metadata on splits?



On Wed, Dec 2, 2015 at 11:02 AM, Josh Mahonin <jmaho...@gmail.com> wrote:

> Hi Krishna,
>
> That's great to hear. You're right, the plugin itself should be backwards
> compatible to Spark 1.3.1 and should be for any version of Phoenix past
> 4.4.0, though I can't guarantee that to be the case forever. As well, I
> don't know how much usage there is across the board using the Java API and
> DataFrames, you in fact may be the first. If you are encountering any
> errors with it could you file a JIRA please with any stack traces you see?
>
> Since Spark is a very quickly changing project, often they update internal
> functionality that we sometimes lag behind on support for, and as a result
> there's no direct mapping between specific Phoenix versions and specific
> Spark versions. We add new support as fast as we get patches, essentially.
>
> My general recommendation is to stay back a major version on Spark if
> possible, but if you need to use the latest Spark releases, try use the
> latest Phoenix release as well. The DataFrame support in Phoenix, for
> instance, has had many patches and improvements recently that older
> versions are missing.
>
> Thanks,
>
> Josh
>
> On Wed, Dec 2, 2015 at 1:40 PM, Krishna <research...@gmail.com> wrote:
>
>> Yes, that works for Spark 1.4.x. Website says Spark 1.3.1+ for Spark
>> plugin, is that accurate?
>>
>> For Spark 1.3.1, I created a dataframe as follows (could not use the
>> plugin):
>> *        Map<String, String> options = new HashMap<String, String>();*
>> *        options.put("url", PhoenixRuntime.JDBC_PROTOCOL +
>> PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR + zkQuorum);*
>> *        options.put("dbtable", "TABLE_NAME");*
>>
>> *        SQLContext sqlContext = new SQLContext(sc);*
>> *        DataFrame jdbcDF = sqlContext.load("jdbc",
>> options).filter("COL_NAME > SOME_VALUE");*
>>
>> Also, it isn't immediately obvious which version of Spark was used in
>> building Phoenix artifacts available on Maven. May be, it's worth putting
>> it on the website. Let me know if the mapping below is incorrect.
>>
>> Phoenix 4.4.x <--> Spark 1.4.0
>> > Phoenix 4.5.x <--> Spark 1.5.0
>> > Phoenix 4.6.x <--> Spark 1.5.0
>>
>>
>> On Tue, Dec 1, 2015 at 7:05 PM, Josh Mahonin <jmaho...@gmail.com> wrote:
>>
>> > Hi Krishna,
>> >
>> > I've not tried it in Java at all, but I as of Spark 1.4+ the DataFrame
>> API
>> > should be unified between Scala and Java, so the following may work for
>> you:
>> >
>> > DataFrame df = sqlContext.read()
>> >     .format("org.apache.phoenix.spark")
>> >     .option("table", "TABLE1")
>> >     .option("zkUrl", "<phoenix-server:2181>")
>> >     .load();
>> >
>> > Note that 'zkUrl' must be set to your Phoenix URL, and passing a 'conf'
>> > parameter isn't supported. Please let us know back here if this works
>> out
>> > for you, I'd love to update the documentation and unit tests if it
>> works.
>> >
>> > Josh
>> >
>> > On Tue, Dec 1, 2015 at 6:30 PM, Krishna <research...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Is there a working example for using spark plugin in Java?
>> Specifically,
>> >> what's the java equivalent for creating a dataframe as shown here in
>> scala:
>> >>
>> >> val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
>> "COL1"), conf = configuration)
>> >>
>> >>
>> >
>>
>
>

Re: spark plugin with java

Reply via email to