With Spark 2.0 Dataframes are a special case of Datasets, so every problem applying to the latter applies also to the former. PredictionIO is built around RDDs, but it doesn't stop you from using Dataframes internally in your engine. By defining custom types in DASE architecture of your engine, you should be able to utilize Dataframes (Datasets with Spark 2.0 introduced by PR mentioned earlier). However, trying to access PEventStore to collect your data you will get RDDs, which you would have to convert to Dataframes if necessary.
niedz., 2.10.2016 o 11:12 użytkownik Georg Heiler <[email protected]> napisał: > Thanks. > After looking around some more I realized that most engines are using RDD > and not data frames. > Is there a similar limitation as for datasets? > > Regards, > Georg > > Marcin Ziemiński <[email protected]> schrieb am Fr., 30. Sep. 2016 um > 20:14 Uhr: > > So this is the mentioned PR: > https://github.com/apache/incubator-predictionio/pull/295 > > I am aware this is not enough, but this is a necessary step towards > bringing desired changes. > > Best regards, > Marcin > > pt., 30.09.2016 o 19:50 użytkownik Georg Heiler <[email protected]> > napisał: > > Thanks. > So a simple recompile for scala 2.11 and upgrade of the spark dependencies > would not be enough. > > Would you mind sharing this pull request. I can't seem to find it via > Google. > Thanks again. > Regards Georg > Marcin Ziemiński <[email protected]> schrieb am Fr. 30. Sep. 2016 um > 18:05: > > Hi Georg, > > There is currently no support for Apache NiFi integration in the project. > I have personally been looking closer at NiFi recently and it seems like a > good idea to glue it with PIO. > PredictionIO is now in the stage of Apache incubation and the future > releases after 0.10 will show more new functionality. If you have any ideas > how it could look like, please feel free to share your conceptions. This is > actually a very good moment to bring up such issues. > > As far as Datasets are concerned, PIO does not currently support Datasets > in its API. There is currently a pull request with an update to Spark 2.0, > so Datasets could be used internally in engines once this is merged, but > the API doesn't reflect such changes now. > > Regards, > Marcin > > pt., 30.09.2016 o 17:24 użytkownik Georg Heiler <[email protected]> > napisał: > > Hi, > > does the event server of PIO integrate with apache nifi? > > In the examples you use the spark RDD api. Does PIO support sparks 2.0`s > datasets as well? > > regards, > Georg > >
