That's correct. It's not possible to persist data in memory across jobs in Flink's batch API.
Best, Fabian 2018-02-05 18:28 GMT+01:00 Christophe Jolif <cjo...@gmail.com>: > Fabian, > > Ok thanks for the update. Meanwhile I was looking at how I could still > leverage current FlinkML API, but as far as I can see, it misses the > ability of being able to persist its own models? So even for pure batch it > prevents running your (once built) model in several jobs? Or am I missing > something? > > I suspect I should not be the only one that would love to apply machine > learning as part of a Flink Processing? Waiting for FLIP-23 what are the > "best" practices today? > > Thanks again for your help, > -- > Christophe > > On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi Christophe, >> >> it is true that FlinkML only targets batch workloads. Also, there has not >> been any development since a long time. >> >> In March last year, a discussion was started on the dev mailing list >> about different machine learning features for stream processing [1]. >> One result of this discussion was FLIP-23 [2] which will add a library >> for model serving to Flink, i.e., it can load (and update) machine learning >> models and evaluate them on a stream. >> If you dig through the mailing list thread, you'll find a link to a >> Google doc that discusses other possible directions. >> >> Best, Fabian >> >> [1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d >> 689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E >> [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+ >> Model+Serving >> >> 2018-02-05 16:43 GMT+01:00 Christophe Jolif <cjo...@gmail.com>: >> >>> Hi all, >>> >>> Sorry, this is me again with another question. >>> >>> Maybe I did not search deep enough, but it seems the FlinkML API is >>> still pure batch. >>> >>> If I read https://cwiki.apache.org/confluence/display/FLINK/Flink >>> ML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the >>> streaming nature of Flink, and provide functionality designed >>> specifically for data streams" but from my external point of view, I don't >>> see much happening here. Is there work in progress towards that? >>> >>> I would personally see two use-cases around streaming, first one around >>> updating an existing model that was build in batch, second one would be >>> triggering prediction not through a batch job but in a stream job. >>> >>> Are these things that are in the works? or maybe already feasible >>> despite the API looking like purely batch branded? >>> >>> >>>