Re: Using Dataframe API vs. RDD API?

2018-01-30 Thread Shane Johnson
I remember this now. Thanks Daniel. Does this confirm that I do indeed need to use a spark context when using the new dataframe API (ml vs mllib)? I wanted to make sure there wasn't a way to use the new ml library to predict without using a dataframe. *Shane Johnson | 801.360.3350* LinkedIn

Re: Using Dataframe API vs. RDD API?

2018-01-30 Thread Daniel O' Shaughnessy
Hi Shane, You need to use PAlgorithm instead of P2Algorithm and save/load the spark context accordingly. This way you can use spark context in the predict function. There are examples of using PAlgorithm on the predictionio Site. It’s slightly more complicated but not too bad! On Tue, 30 Jan

Re: Using Dataframe API vs. RDD API?

2018-01-30 Thread Shane Johnson
Thanks team! We are close to having our models working with the Dataframe API. One additional roadblock we are hitting is the fundamental difference in the RDD based API vs the Dataframe API. It seems that the old mllib API would allow a simple vector to get predictions where in the new ml API a

Re: Using Dataframe API vs. RDD API?

2018-01-08 Thread Donald Szeto
We do have work-in-progress for DataFrame API tracked at https://issues.apache.org/jira/browse/PIO-71. Chan, it would be nice if you could create a branch on your personal fork if you want to hand it off to someone else. Thanks! On Fri, Jan 5, 2018 at 2:02 PM, Pat Ferrel

Re: Using Dataframe API vs. RDD API?

2018-01-05 Thread Pat Ferrel
Yes and I do not recommend that because the EventServer schema is not a developer contract. It may change at any time. Use the conversion method and go through the PIO API to get the RDD then convert to DF for now. I’m not sure what PIO uses to get an RDD from Postgres but if they do not use

Re: Using Dataframe API vs. RDD API?

2018-01-05 Thread Daniel O' Shaughnessy
Should have mentioned that I used org.apache.spark.rdd.JdbcRDD to read in the RDD from a postgres DB initially. This was you don't need to use an EventServer! On Fri, 5 Jan 2018 at 15:37 Daniel O' Shaughnessy < danieljamesda...@gmail.com> wrote: > Hi Shane, > > I've successfully used : > >

Re: Using Dataframe API vs. RDD API?

2018-01-05 Thread Daniel O' Shaughnessy
Hi Shane, I've successfully used : import org.apache.spark.ml.classification.{ RandomForestClassificationModel, RandomForestClassifier } with pio. You can access feature importance through the RandomForestClassifier also. Very simple to convert RDDs to DFs as Pat mentioned, something like:

Re: Using Dataframe API vs. RDD API?

2018-01-04 Thread Pat Ferrel
Actually there are libs that will read DFs from HBase https://svn.apache.org/repos/asf/hbase/hbase.apache.org/trunk/_chapters/spark.html This is out of band with PIO and should not be used IMO because the

Re: Using Dataframe API vs. RDD API?

2018-01-04 Thread Pat Ferrel
Funny you should ask this. Yes, we are working on a DF based Universal Recommender but you have to convert the RDD into a DF since PIO does not read out data in the form of a DF (yet). This is a fairly simple step of maybe one line of code but would be better supported in PIO itself. The issue