Re: ML and Stream

Fabian Hueske Mon, 05 Feb 2018 12:05:17 -0800

That's correct.
It's not possible to persist data in memory across jobs in Flink's batch
API.


Best, Fabian

2018-02-05 18:28 GMT+01:00 Christophe Jolif <cjo...@gmail.com>:

> Fabian,
>
> Ok thanks for the update. Meanwhile I was looking at how I could still
> leverage current FlinkML API, but as far as I can see, it misses the
> ability of being able to persist its own models? So even for pure batch it
> prevents running your (once built) model in several jobs? Or am I missing
> something?
>
> I suspect I should not be the only one that would love to apply machine
> learning as part of a Flink Processing? Waiting for FLIP-23 what are the
> "best" practices today?
>
> Thanks again for your help,
> --
> Christophe
>
> On Mon, Feb 5, 2018 at 6:01 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi Christophe,
>>
>> it is true that FlinkML only targets batch workloads. Also, there has not
>> been any development since a long time.
>>
>> In March last year, a discussion was started on the dev mailing list
>> about different machine learning features for stream processing [1].
>> One result of this discussion was FLIP-23 [2] which will add a library
>> for model serving to Flink, i.e., it can load (and update) machine learning
>> models and evaluate them on a stream.
>> If you dig through the mailing list thread, you'll find a link to a
>> Google doc that discusses other possible directions.
>>
>> Best, Fabian
>>
>> [1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d
>> 689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E
>> [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+
>> Model+Serving
>>
>> 2018-02-05 16:43 GMT+01:00 Christophe Jolif <cjo...@gmail.com>:
>>
>>> Hi all,
>>>
>>> Sorry, this is me again with another question.
>>>
>>> Maybe I did not search deep enough, but it seems the FlinkML API is
>>> still pure batch.
>>>
>>> If I read https://cwiki.apache.org/confluence/display/FLINK/Flink
>>> ML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the
>>> streaming nature of Flink, and provide functionality designed
>>> specifically for data streams" but from my external point of view, I don't
>>> see much happening here. Is there work in progress towards that?
>>>
>>> I would personally see two use-cases around streaming, first one around
>>> updating an existing model that was build in batch, second one would be
>>> triggering prediction not through a batch job but in a stream job.
>>>
>>> Are these things that are in the works? or maybe already feasible
>>> despite the API looking like purely batch branded?
>>>
>>>
>>>

Re: ML and Stream

Reply via email to