We do have work-in-progress for DataFrame API tracked at https://issues.apache.org/jira/browse/PIO-71.
Chan, it would be nice if you could create a branch on your personal fork if you want to hand it off to someone else. Thanks! On Fri, Jan 5, 2018 at 2:02 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Yes and I do not recommend that because the EventServer schema is not a > developer contract. It may change at any time. Use the conversion method > and go through the PIO API to get the RDD then convert to DF for now. > > I’m not sure what PIO uses to get an RDD from Postgres but if they do not > use something like the lib you mention, a PR would be nice. Also if you > have an interest in adding the DF APIs to the EventServer contributions are > encouraged. Committers will give some guidance I’m sure—once that know more > than me on the subject. > > If you want to donate some DF code, create a Jira and we’ll easily find a > mentor to make suggestions. There are many benefits to this including not > having to support a fork of PIO through subsequent versions. Also others > are interested in this too. > > > > On Jan 5, 2018, at 7:39 AM, Daniel O' Shaughnessy < > danieljamesda...@gmail.com> wrote: > > ....Should have mentioned that I used org.apache.spark.rdd.JdbcRDD to > read in the RDD from a postgres DB initially. > > This was you don't need to use an EventServer! > > On Fri, 5 Jan 2018 at 15:37 Daniel O' Shaughnessy < > danieljamesda...@gmail.com> wrote: > >> Hi Shane, >> >> I've successfully used : >> >> import org.apache.spark.ml.classification.{ >> RandomForestClassificationModel, RandomForestClassifier } >> >> with pio. You can access feature importance through the >> RandomForestClassifier also. >> >> Very simple to convert RDDs to DFs as Pat mentioned, something like: >> >> val RDD_2_DF = sqlContext.createDataFrame(yourRDD).toDF("col1", "col2") >> >> >> >> On Thu, 4 Jan 2018 at 23:10 Pat Ferrel <p...@occamsmachete.com> wrote: >> >>> Actually there are libs that will read DFs from HBase >>> https://svn.apache.org/repos/asf/hbase/hbase.apache. >>> org/trunk/_chapters/spark.html >>> >>> This is out of band with PIO and should not be used IMO because the >>> schema of the EventStore is not guaranteed to remain as-is. The safest way >>> is to translate or get DFs integrated to PIO. I think there is an existing >>> Jira that request Spark ML support, which assumes DFs. >>> >>> >>> On Jan 4, 2018, at 12:25 PM, Pat Ferrel <p...@occamsmachete.com> wrote: >>> >>> Funny you should ask this. Yes, we are working on a DF based Universal >>> Recommender but you have to convert the RDD into a DF since PIO does not >>> read out data in the form of a DF (yet). This is a fairly simple step of >>> maybe one line of code but would be better supported in PIO itself. The >>> issue is that the EventStore uses libs that may not read out DFs, but RDDs. >>> This is certainly the case with Elasticsearch, which provides an RDD lib. I >>> haven’t seen one from them that read out DFs though it would make a lot of >>> sense for ES especially. >>> >>> So TLDR; yes, just convert the RDD into a DF for now. >>> >>> Also please add a feature request as a PIO Jira ticket to look into >>> this. I for one would +1 >>> >>> >>> On Jan 4, 2018, at 11:55 AM, Shane Johnson <shanewaldenjohn...@gmail.com> >>> wrote: >>> >>> Hello group, Happy new year! Does anyone have a working example or >>> template using the DataFrame API vs. the RDD based APIs. We are wanting to >>> migrate to using the new DataFrame APIs to take advantage of the *Feature >>> Importance* function for our Regression Random Forest Models. >>> >>> We are wanting to move from >>> >>> import org.apache.spark.mllib.tree.RandomForestimport >>> org.apache.spark.mllib.tree.model.RandomForestModelimport >>> org.apache.spark.mllib.util.MLUtils >>> >>> to >>> >>> import org.apache.spark.ml.regression.{RandomForestRegressionModel, >>> RandomForestRegressor} >>> >>> >>> Is this something that should be fairly straightforward by adjusting >>> parameters and calling new classes within DASE or is it much more involved >>> development. >>> >>> Thank You! >>> >>> *Shane Johnson | 801.360.3350 <(801)%20360-3350>* >>> LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook >>> <https://www.facebook.com/shane.johnson.71653> >>> >>> >>> >