Thanks Piotrek for your response. Teena responsed for same. I am implementing changes to try it out.
Yes, Originally I did call keyBy for same reason so that I can parallelize the operation. On Thu, Mar 1, 2018 at 1:24 AM, Piotr Nowojski <pi...@data-artisans.com> wrote: > Hi, > > timeWindowAll is a non parallel operation, since it gathers all of the > elements and process them together: > > https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/ > apache/flink/streaming/api/datastream/DataStream.html# > timeWindowAll-org.apache.flink.streaming.api.windowing. > time.Time-org.apache.flink.streaming.api.windowing.time.Time- > > Note that it’s defined in DataStream, not in the KeyedStream. > > In your keyBy example keyBy() is just a NoOp. Didn’t you mean to use > KeyedStream#timeWindows method? > > Piotrek > > On 1 Mar 2018, at 09:21, Ashish Attarde <ashish.atta...@gmail.com> wrote: > > Hi, > > I am new to Flink and in general data processing using stream processors. > > I am using flink to do real time correlation between multiple records > which are coming as part of same stream. I am doing is "apply" operation on > TimeWindowed stream. When I submit job with parallelism factor of 4, I am > still seeing apply operation is applied with parallelism factor of 1. > > Here is the peice of code : > > parsedInput.keyBy("mflowHash") > .timeWindowAll(Time.milliseconds(1000),Time.milliseconds(200)) > .allowedLateness(Time.seconds(10)) > .apply(new CRWindow()); > > > I am trying to correlate 2 streams, what is the right way to do it? I > tried the CEP library and experienced the worst performance. It is taking > ~4 minutes to do the correlation. The corelation logic is very simple and > not compute intensive. > > > -- > > Thanks > -Ashish Attarde > > > > -- > > Thanks > -Ashish Attarde > > > -- Thanks -Ashish Attarde