PLC/Scada/Sensor anomaly detection

2016-05-03 Thread Ivan
Hello! Has anyone used Flink in "production" for PLC's sanomaly detections? Any pointers/docs to check? Best regards, Iván Venzor C.

Re: How to perform this join operation?

2016-05-03 Thread Elias Levy
Till, Thanks again for putting this together. It is certainly along the lines of what I want to accomplish, but I see some problem with it. In your code you use a ValueStore to store the priority queue. If you are expecting to store a lot of values in the queue, then you are likely to be using

Re: how to convert datastream to collection

2016-05-03 Thread Srikanth
Why do you want collect and iterate? Why not iterate on the DataStream itself? May be I didn't understand your use case completely. Srikanth On Tue, May 3, 2016 at 10:55 AM, Aljoscha Krettek wrote: > Hi, > please keep in mind that we're dealing with streams. The Iterator

Re: Flink + Kafka + Scalabuff issue

2016-05-03 Thread Alexander Gryzlov
Hello, Just to follow up on this issue: after collecting some data and setting up additional tests we have managed to pinpoint the issue to the the ScalaBuff-generated code that decodes enumerations. After switching to use ScalaPB generator instead, the problem was gone. One thing peculiar about

Re: how to convert datastream to collection

2016-05-03 Thread Aljoscha Krettek
Hi, please keep in mind that we're dealing with streams. The Iterator might never finish. Cheers, Aljoscha On Tue, 3 May 2016 at 16:35 Suneel Marthi wrote: > DataStream> *newCentroids = new DataStream<>.()* > > *Iterator> iter

Re: how to convert datastream to collection

2016-05-03 Thread Suneel Marthi
DataStream> *newCentroids = new DataStream<>.()* *Iterator> iter = DataStreamUtils.collect(newCentroids);* *List> list = Lists.newArrayList(iter);* On Tue, May 3, 2016 at 10:26 AM, subash basnet wrote: > Hello all, >

Re: How to perform Broadcast and groupBy in DataStream like DataSet

2016-05-03 Thread subash basnet
Hello Stefano, Thank you, I found out that just sometime ago that I could use keyBy, but I couldn't find how to set and getBroadcastVariable in datastream like that of dataset. For example in below code we get collection of *centroids* via broadcast. Eg: In KMeans.java class X extends

Re: Creating a custom operator

2016-05-03 Thread Simone Robutti
I'm not sure this is the right way to do it but we were exploring all the possibilities and this one is the more obvious. We also spent some time to study how to do it to achieve a better understanding of Flink's internals. What we want to do though is to integrate Flink with another distributed

Flink - start-cluster.sh

2016-05-03 Thread Punit Naik
Hi I did all the settings required for cluster setup. but when I ran the start-cluster.sh script, it only started one jobmanager on the master node. Logs are written only on the master node. Slaves don't have any logs. And when I ran a program it said: Resources available to scheduler: Number of

Re: Flink Iterations Ordering

2016-05-03 Thread Stephan Ewen
Hi! The order in which the elements arrive in an iteration HEAD is the order in which the last operator in the loop (the TAIL) produces them. If that is a deterministic ordering (because of a sorted reduce, for example), then you should be able to rely on the order. Otherwise, the order of

Re: TimeWindow overload?

2016-05-03 Thread Stephan Ewen
Just had a quick chat with Aljoscha... The first version of the aligned window code will still duplicate the elements, later versions should be able to get rid of that. On Tue, May 3, 2016 at 11:10 AM, Aljoscha Krettek wrote: > Hi, > even with the optimized operator for

Re: Measuring latency in a DataStream

2016-05-03 Thread Robert Schmidtke
After fixing the clock issue on the application level, the latency is as expected. Thanks again! Robert On Tue, May 3, 2016 at 9:54 AM, Robert Schmidtke wrote: > Hi Igor, thanks for your reply. > > As for your first point I'm not sure I understand correctly. I'm

Re: Creating a custom operator

2016-05-03 Thread Fabian Hueske
Hi Simone, you are right, the interfaces you extend are not considered to be public, user-facing API. Adding custom operators to the DataSet API touches many parts of the system and is not straightforward. The DataStream API has better support for custom operators. Can you explain what kind of

Re: Sink - writeAsText problem

2016-05-03 Thread Punit Naik
Yeah thanks for letting me know. On 03-May-2016 2:40 PM, "Fabian Hueske" wrote: > Yes, but be aware that your program runs with parallelism 1 if you do not > configure the parallelism. > > 2016-05-03 11:07 GMT+02:00 Punit Naik : > >> Hi Stephen, Fabian

Re: Creating a custom operator

2016-05-03 Thread Simone Robutti
Hello Fabian, we delved more moving from the input you gave us but a question arised. We always assumed that runtime operators were open for extension without modifying anything inside Flink but it looks like this is not the case and the documentation assumes that the developer is working to a

How to perform Broadcast and groupBy in DataStream like DataSet

2016-05-03 Thread subash basnet
Hello all, How could we perform *withBroadcastSet* and *groupBy* in DataStream like that of DataSet in the below KMeans code: DataSet newCentroids = points .map(new SelectNearestCenter()).*withBroadcastSet*(loop, "centroids") .map(new CountAppender()).*groupBy*(0).reduce(new

Re: Insufficient number of network buffers

2016-05-03 Thread Ufuk Celebi
Hey Tarandeep, I think the failures are unrelated. Regarding the number of network buffers: https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-the-network-buffers The timeouts might occur, because the task managers are pretty loaded. I would suggest to

Re: TimeWindow overload?

2016-05-03 Thread Aljoscha Krettek
Hi, even with the optimized operator for aligned time windows I would advice against using long sliding windows with a small slide. The system will internally create a lot of "buckets", i.e. each sliding window is treated separately and the element is put into 1,440 buckets, in your case. With a

Re: Sink - writeAsText problem

2016-05-03 Thread Fabian Hueske
Yes, but be aware that your program runs with parallelism 1 if you do not configure the parallelism. 2016-05-03 11:07 GMT+02:00 Punit Naik : > Hi Stephen, Fabian > > setting "fs.output.always-create-directory" to true in flink-config.yml > worked! > > On Tue, May 3, 2016

Re: Sink - writeAsText problem

2016-05-03 Thread Punit Naik
Hi Stephen, Fabian setting "fs.output.always-create-directory" to true in flink-config.yml worked! On Tue, May 3, 2016 at 2:27 PM, Stephan Ewen wrote: > Hi! > > There is the option to always create a directory: > "fs.output.always-create-directory" > > See >

Re: Scala compilation error

2016-05-03 Thread Aljoscha Krettek
There is a Scaladoc but it is not covering all packages, unfortunately. In the Scala API you can call transform without specifying a TypeInformation, it works using implicits/context bounds. On Tue, 3 May 2016 at 01:48 Srikanth wrote: > Sorry for the previous incomplete

Re: Sink - writeAsText problem

2016-05-03 Thread Stephan Ewen
Hi! There is the option to always create a directory: "fs.output.always-create-directory" See https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#file-systems Greetings, Stephan On Tue, May 3, 2016 at 9:26 AM, Punit Naik wrote: > Hello > > I

Re: How to perform this join operation?

2016-05-03 Thread Aljoscha Krettek
Hi Elias, thanks for the long write-up. It's interesting that it actually kinda works right now. You might be interested in a design doc that we're currently working on. I posted it on the dev list but here it is:

Re: Measuring latency in a DataStream

2016-05-03 Thread Robert Schmidtke
Hi Igor, thanks for your reply. As for your first point I'm not sure I understand correctly. I'm ingesting records at a rate of about 50k records per second, and those records are fairly small. If I add a time stamp to each of them, I will have a lot more data, which is not exactly what I want.

Re: Sink - writeAsText problem

2016-05-03 Thread Fabian Hueske
Did you specify a parallelism? The default parallelism of a Flink instance is 1 [1]. You can set a different default parallelism in ./conf/flink-conf.yaml or pass a job specific parallelism with ./bin/flink using the -p flag [2]. More options to define parallelism are in the docs [3]. [1]

Sink - writeAsText problem

2016-05-03 Thread Punit Naik
Hello I executed my Flink code in eclipse and it properly generated the output by creating a folder (as specified in the string) and placing output files in them. But when I exported the project as JAR and ran the same code using ./flink run, it generated the output, but instead of creating a

Re: TimeWindow overload?

2016-05-03 Thread Stephan Ewen
Hi Elias! There is a feature pending that uses an optimized version for aligned time windows. In that case, elements would go into a single window pane, and the full window would be composed of all panes it spans (in the case of sliding windows). That should help a lot in those cases. The