We do have work-in-progress for DataFrame API tracked at
https://issues.apache.org/jira/browse/PIO-71.

Chan, it would be nice if you could create a branch on your personal fork
if you want to hand it off to someone else. Thanks!

On Fri, Jan 5, 2018 at 2:02 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Yes and I do not recommend that because the EventServer schema is not a
> developer contract. It may change at any time. Use the conversion method
> and go through the PIO API to get the RDD then convert to DF for now.
>
> I’m not sure what PIO uses to get an RDD from Postgres but if they do not
> use something like the lib you mention, a PR would be nice. Also if you
> have an interest in adding the DF APIs to the EventServer contributions are
> encouraged. Committers will give some guidance I’m sure—once that know more
> than me on the subject.
>
> If you want to donate some DF code, create a Jira and we’ll easily find a
> mentor to make suggestions. There are many benefits to this including not
> having to support a fork of PIO through subsequent versions. Also others
> are interested in this too.
>
>
>
> On Jan 5, 2018, at 7:39 AM, Daniel O' Shaughnessy <
> danieljamesda...@gmail.com> wrote:
>
> ....Should have mentioned that I used org.apache.spark.rdd.JdbcRDD to
> read in the RDD from a postgres DB initially.
>
> This was you don't need to use an EventServer!
>
> On Fri, 5 Jan 2018 at 15:37 Daniel O' Shaughnessy <
> danieljamesda...@gmail.com> wrote:
>
>> Hi Shane,
>>
>> I've successfully used :
>>
>> import org.apache.spark.ml.classification.{
>> RandomForestClassificationModel, RandomForestClassifier }
>>
>> with pio. You can access feature importance through the
>> RandomForestClassifier also.
>>
>> Very simple to convert RDDs to DFs as Pat mentioned, something like:
>>
>> val RDD_2_DF = sqlContext.createDataFrame(yourRDD).toDF("col1", "col2")
>>
>>
>>
>> On Thu, 4 Jan 2018 at 23:10 Pat Ferrel <p...@occamsmachete.com> wrote:
>>
>>> Actually there are libs that will read DFs from HBase
>>> https://svn.apache.org/repos/asf/hbase/hbase.apache.
>>> org/trunk/_chapters/spark.html
>>>
>>> This is out of band with PIO and should not be used IMO because the
>>> schema of the EventStore is not guaranteed to remain as-is. The safest way
>>> is to translate or get DFs integrated to PIO. I think there is an existing
>>> Jira that request Spark ML support, which assumes DFs.
>>>
>>>
>>> On Jan 4, 2018, at 12:25 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>>>
>>> Funny you should ask this. Yes, we are working on a DF based Universal
>>> Recommender but you have to convert the RDD into a DF since PIO does not
>>> read out data in the form of a DF (yet). This is a fairly simple step of
>>> maybe one line of code but would be better supported in PIO itself. The
>>> issue is that the EventStore uses libs that may not read out DFs, but RDDs.
>>> This is certainly the case with Elasticsearch, which provides an RDD lib. I
>>> haven’t seen one from them that read out DFs though it would make a lot of
>>> sense for ES especially.
>>>
>>> So TLDR; yes, just convert the RDD into a DF for now.
>>>
>>> Also please add a feature request as a PIO Jira ticket to look into
>>> this. I for one would +1
>>>
>>>
>>> On Jan 4, 2018, at 11:55 AM, Shane Johnson <shanewaldenjohn...@gmail.com>
>>> wrote:
>>>
>>> Hello group, Happy new year! Does anyone have a working example or
>>> template using the DataFrame API vs. the RDD based APIs. We are wanting to
>>> migrate to using the new DataFrame APIs to take advantage of the *Feature
>>> Importance* function for our Regression Random Forest Models.
>>>
>>> We are wanting to move from
>>>
>>> import org.apache.spark.mllib.tree.RandomForestimport 
>>> org.apache.spark.mllib.tree.model.RandomForestModelimport 
>>> org.apache.spark.mllib.util.MLUtils
>>>
>>> to
>>>
>>> import org.apache.spark.ml.regression.{RandomForestRegressionModel, 
>>> RandomForestRegressor}
>>>
>>>
>>> Is this something that should be fairly straightforward by adjusting
>>> parameters and calling new classes within DASE or is it much more involved
>>> development.
>>>
>>> Thank You!
>>>
>>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>*
>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook
>>> <https://www.facebook.com/shane.johnson.71653>
>>>
>>>
>>>
>

Reply via email to