Re: Using Dataframe API vs. RDD API?

2018-01-05 Thread Pat Ferrel
Yes and I do not recommend that because the EventServer schema is not a developer contract. It may change at any time. Use the conversion method and go through the PIO API to get the RDD then convert to DF for now. I’m not sure what PIO uses to get an RDD from Postgres but if they do not use

Re: Using Dataframe API vs. RDD API?

2018-01-05 Thread Daniel O' Shaughnessy
Should have mentioned that I used org.apache.spark.rdd.JdbcRDD to read in the RDD from a postgres DB initially. This was you don't need to use an EventServer! On Fri, 5 Jan 2018 at 15:37 Daniel O' Shaughnessy < danieljamesda...@gmail.com> wrote: > Hi Shane, > > I've successfully used : > >

Re: Using Dataframe API vs. RDD API?

2018-01-05 Thread Daniel O' Shaughnessy
Hi Shane, I've successfully used : import org.apache.spark.ml.classification.{ RandomForestClassificationModel, RandomForestClassifier } with pio. You can access feature importance through the RandomForestClassifier also. Very simple to convert RDDs to DFs as Pat mentioned, something like: