Re: streaming predictions

2018-07-24 Thread Andrea Spina
Dear Cederic,
I did something similar as yours a while ago along this work [1] but I've
always been working within the batch context. I'm also the co-author of
flink-jpmml and, since a flink2pmml model saver library doesn't exist
currently, I'd suggest you a twofold strategy to tackle this problem:
- if your model is relatively simple, take the batch evaluate method (it
belongs to your SVM classifier) and attempt to translate it in a flatMap
function (hopefully you can reuse some internal utilities, Flink exploits
breeze vector library under the hoods [3]).
- if your model is a complex one, you should export the model into PMML and
employ then [2]. For a first overview, this [4] is the library you should
adopt as to export your model and this [5] can help you with the related
implementation.

Hope it can help and good luck!

Andrea

[1] https://dl.acm.org/citation.cfm?id=3070612
[2] https://github.com/FlinkML/flink-jpmml
[3]
https://github.com/apache/flink/blob/7034e9cfcb051ef90c5bf0960bfb50a79b3723f0/flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala#L73
[4] https://github.com/jpmml/jpmml-model
[5] https://github.com/jpmml/jpmml-sparkml

2018-07-24 13:29 GMT+02:00 David Anderson :

> One option (which I haven't tried myself) would be to somehow get the
> model into PMML format, and then use https://github.com/
> FlinkML/flink-jpmml to score the model. You could either use another
> machine learning framework to train the model (i.e., a framework that
> directly supports PMML export), or convert the Flink model into PMML. Since
> SVMs are fairly simple to describe, that might not be terribly difficult.
>
> On Mon, Jul 23, 2018 at 4:18 AM Xingcan Cui  wrote:
>
>> Hi Cederic,
>>
>> If the model is a simple function, you can just load it and make
>> predictions using the map/flatMap function in the StreamEnvironment.
>>
>> But I’m afraid the model trained by Flink-ML should be a “batch job",
>> whose predict method takes a Dataset as the parameter and outputs another
>> Dataset as the result. That means you cannot easily apply the model on
>> streams, at least for now.
>>
>> There are two options to solve this. (1) Train the dataset using another
>> framework to produce a simple function. (2) Adjust your model serving as a
>> series of batch jobs.
>>
>> Hope that helps,
>> Xingcan
>>
>> On Jul 22, 2018, at 8:56 PM, Hequn Cheng  wrote:
>>
>> Hi Cederic,
>>
>> I am not familiar with SVM or machine learning but I think we can work it
>> out together.
>> What problem have you met when you try to implement this function? From
>> my point of view, we can rebuild the model in the flatMap function and use
>> it to predict the input data. There are some flatMap documents here[1].
>>
>> Best, Hequn
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-
>> master/dev/stream/operators/#datastream-transformations
>>
>>
>>
>>
>>
>> On Sun, Jul 22, 2018 at 4:12 PM, Cederic Bosmans 
>> wrote:
>>
>>> Dear
>>>
>>> My name is Cederic Bosmans and I am a masters student at the Ghent
>>> University (Belgium).
>>> I am currently working on my masters dissertation which involves Apache
>>> Flink.
>>>
>>> I want to make predictions in the streaming environment based on a model
>>> trained in the batch environment.
>>>
>>> I trained my SVM-model this way:
>>> val svm2 = SVM()
>>> svm2.setSeed(1)
>>> svm2.fit(trainLV)
>>> val testVD = testLV.map(lv => (lv.vector, lv.label))
>>> val evalSet = svm2.evaluate(testVD)
>>>
>>> and saved the model:
>>> val modelSvm = svm2.weightsOption.get
>>>
>>> Then I have an incoming datastream in the streaming environment:
>>> dataStream[(Int, Int, Int)]
>>> which should be bininary classified using this trained SVM model.
>>>
>>> Since the predict function does only support DataSet and not DataStream,
>>> on stackoverflow a flink contributor mentioned that this should be done
>>> using a map/flatMap function.
>>> Unfortunately I am not able to work this function out.
>>>
>>> It would be incredible for me if you could help me a little bit further!
>>>
>>> Kind regards and thanks in advance
>>> Cederic Bosmans
>>>
>>
>>
>>
>
> --
> *David Anderson* | Training Coordinator | data Artisans
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time
>



-- 
*Andrea Spina*
Software Engineer @ Radicalbit Srl
Via Borsieri 41, 20159, Milano - IT


Re: streaming predictions

2018-07-24 Thread David Anderson
One option (which I haven't tried myself) would be to somehow get the model
into PMML format, and then use https://github.com/FlinkML/flink-jpmml to
score the model. You could either use another machine learning framework to
train the model (i.e., a framework that directly supports PMML export), or
convert the Flink model into PMML. Since SVMs are fairly simple to
describe, that might not be terribly difficult.

On Mon, Jul 23, 2018 at 4:18 AM Xingcan Cui  wrote:

> Hi Cederic,
>
> If the model is a simple function, you can just load it and make
> predictions using the map/flatMap function in the StreamEnvironment.
>
> But I’m afraid the model trained by Flink-ML should be a “batch job",
> whose predict method takes a Dataset as the parameter and outputs another
> Dataset as the result. That means you cannot easily apply the model on
> streams, at least for now.
>
> There are two options to solve this. (1) Train the dataset using another
> framework to produce a simple function. (2) Adjust your model serving as a
> series of batch jobs.
>
> Hope that helps,
> Xingcan
>
> On Jul 22, 2018, at 8:56 PM, Hequn Cheng  wrote:
>
> Hi Cederic,
>
> I am not familiar with SVM or machine learning but I think we can work it
> out together.
> What problem have you met when you try to implement this function? From my
> point of view, we can rebuild the model in the flatMap function and use it
> to predict the input data. There are some flatMap documents here[1].
>
> Best, Hequn
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#datastream-transformations
>
>
>
>
>
> On Sun, Jul 22, 2018 at 4:12 PM, Cederic Bosmans 
> wrote:
>
>> Dear
>>
>> My name is Cederic Bosmans and I am a masters student at the Ghent
>> University (Belgium).
>> I am currently working on my masters dissertation which involves Apache
>> Flink.
>>
>> I want to make predictions in the streaming environment based on a model
>> trained in the batch environment.
>>
>> I trained my SVM-model this way:
>> val svm2 = SVM()
>> svm2.setSeed(1)
>> svm2.fit(trainLV)
>> val testVD = testLV.map(lv => (lv.vector, lv.label))
>> val evalSet = svm2.evaluate(testVD)
>>
>> and saved the model:
>> val modelSvm = svm2.weightsOption.get
>>
>> Then I have an incoming datastream in the streaming environment:
>> dataStream[(Int, Int, Int)]
>> which should be bininary classified using this trained SVM model.
>>
>> Since the predict function does only support DataSet and not DataStream,
>> on stackoverflow a flink contributor mentioned that this should be done
>> using a map/flatMap function.
>> Unfortunately I am not able to work this function out.
>>
>> It would be incredible for me if you could help me a little bit further!
>>
>> Kind regards and thanks in advance
>> Cederic Bosmans
>>
>
>
>

-- 
*David Anderson* | Training Coordinator | data Artisans
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time


Re: streaming predictions

2018-07-22 Thread Xingcan Cui
Hi Cederic,

If the model is a simple function, you can just load it and make predictions 
using the map/flatMap function in the StreamEnvironment.

But I’m afraid the model trained by Flink-ML should be a “batch job", whose 
predict method takes a Dataset as the parameter and outputs another Dataset as 
the result. That means you cannot easily apply the model on streams, at least 
for now.

There are two options to solve this. (1) Train the dataset using another 
framework to produce a simple function. (2) Adjust your model serving as a 
series of batch jobs.

Hope that helps,
Xingcan

> On Jul 22, 2018, at 8:56 PM, Hequn Cheng  wrote:
> 
> Hi Cederic,
> 
> I am not familiar with SVM or machine learning but I think we can work it out 
> together.
> What problem have you met when you try to implement this function? From my 
> point of view, we can rebuild the model in the flatMap function and use it to 
> predict the input data. There are some flatMap documents here[1]. 
> 
> Best, Hequn
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#datastream-transformations
>  
> 
> 
> 
> 
> 
> 
> On Sun, Jul 22, 2018 at 4:12 PM, Cederic Bosmans  > wrote:
> Dear
> 
> My name is Cederic Bosmans and I am a masters student at the Ghent University 
> (Belgium).
> I am currently working on my masters dissertation which involves Apache Flink.
> 
> I want to make predictions in the streaming environment based on a model 
> trained in the batch environment.
> 
> I trained my SVM-model this way:
> val svm2 = SVM()
> svm2.setSeed(1)
> svm2.fit(trainLV)
> val testVD = testLV.map(lv => (lv.vector, lv.label))
> val evalSet = svm2.evaluate(testVD)
> 
> and saved the model: 
> val modelSvm = svm2.weightsOption.get
> 
> Then I have an incoming datastream in the streaming environment:
> dataStream[(Int, Int, Int)]
> which should be bininary classified using this trained SVM model.
> 
> Since the predict function does only support DataSet and not DataStream, on 
> stackoverflow a flink contributor mentioned that this should be done using a 
> map/flatMap function.
> Unfortunately I am not able to work this function out.
> 
> It would be incredible for me if you could help me a little bit further!
> 
> 
> Kind regards and thanks in advance
> Cederic Bosmans
> 



streaming predictions

2018-07-22 Thread Cederic Bosmans
Dear

My name is Cederic Bosmans and I am a masters student at the Ghent
University (Belgium).
I am currently working on my masters dissertation which involves Apache
Flink.

I want to make predictions in the streaming environment based on a model
trained in the batch environment.

I trained my SVM-model this way:
val svm2 = SVM()
svm2.setSeed(1)
svm2.fit(trainLV)
val testVD = testLV.map(lv => (lv.vector, lv.label))
val evalSet = svm2.evaluate(testVD)

and saved the model:
val modelSvm = svm2.weightsOption.get

Then I have an incoming datastream in the streaming environment:
dataStream[(Int, Int, Int)]
which should be bininary classified using this trained SVM model.

Since the predict function does only support DataSet and not DataStream, on
stackoverflow a flink contributor mentioned that this should be done using
a map/flatMap function.
Unfortunately I am not able to work this function out.

It would be incredible for me if you could help me a little bit further!

Kind regards and thanks in advance
Cederic Bosmans