Re: Adding Context To Logs

2016-06-02 Thread Kostas Kloudas
Hello Paul, If I understand correctly, your issues seem to be related to: https://issues.apache.org/jira/browse/FLINK-1502?jql=assignee%20in%20(Zentol)%20AND%20text%20~%20%22metrics%22

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-14 Thread Kostas Kloudas
Hello Hironori, Are you using the latest Flink version? There were some changes in the FlinkConsumer in the latest releases. Thanks, Kostas > On Jun 14, 2016, at 11:52 AM, Hironori Ogibayashi > wrote: > > Hello, > > I am running Flink job which reads topics from Kafka

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-14 Thread Kostas Kloudas
ed on > ContinuousProcessingTimeTrigger but clean up windows when it received > specific log records. > > Thanks, > Hironori > > 2016-06-14 21:23 GMT+09:00 Kostas Kloudas <k.klou...@data-artisans.com>: >> Hi Hironori, >> >> Could you also provide the log

Re: Exception in thread main: No such exception errpr

2016-06-02 Thread Kostas Kloudas
Hello Debaditya, From the exception message you posted it seems that it is a linkage error. Could it be that you are combining different versions of Flink when running your application? E.g. you have version X running on your cluster and you create your jar against version Y on your local

Re: Exception in thread main: No such exception errpr

2016-06-02 Thread Kostas Kloudas
e, not on a distributed cluster. Any other input? > > Warm Regards, > Debaditya > > > > On Thu, Jun 2, 2016 at 1:27 PM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.klou...@data-artisans.com>> wrote: > Hello Debaditya, > > From the exception

Re: Exception in thread main: No such exception errpr

2016-06-02 Thread Kostas Kloudas
rg.apache.flink:flink-yarn-tests:1.0.3' not found. > > Warm Regards, > Debaditya > > On Thu, Jun 2, 2016 at 3:32 PM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.klou...@data-artisans.com>> wrote: > Could you replace 0.10-SNAPSHOT with ${flink.version

Re: Exception in thread main: No such exception errpr

2016-06-02 Thread Kostas Kloudas
Could you replace 0.10-SNAPSHOT with ${flink.version} in the pom? Thanks, Kostas > On Jun 2, 2016, at 3:13 PM, Debaditya Roy wrote: > > 0.10-SNAPSHOT

Re: Exception in thread main: No such exception errpr

2016-06-02 Thread Kostas Kloudas
-Xms768m > 2016-06-02 18:37:27,385 INFO org.apache.flink.runtime.jobmanager.JobManager > - -Xmx768m > 2016-06-02 18:37:27,385 INFO org.apache.flink.runtime.jobmanager.JobManager > - > -Dlog.file=/home/royd1990/Downloads/flink-1.0.3/log/flink-royd

Re: maximum size of window

2016-06-28 Thread Kostas Kloudas
e window state? > > Regards, > Vishnu > > > On Mon, Jun 27, 2016 at 6:19 PM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.klou...@data-artisans.com>> wrote: > Hi Vishnu, > > I hope the following will help answer your question: > > 1) Eleme

Re: maximum size of window

2016-06-27 Thread Kostas Kloudas
Hi Vishnu, I hope the following will help answer your question: 1) Elements are first split by key (apart from global windows) and then are put into windows. In other words, windows are keyed. 2) A window belonging to a certain key is handled by a single node. In other words, no matter how big

Re: Windows, watermarks, and late data

2016-03-02 Thread Kostas Kloudas
in memory and when a late element > arrives emit the whole window again. > > The code I have is here: > https://github.com/aljoscha/flink/blob/window-late/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/triggers/EventTimeTrigger.java > > Kostas Klou

Re: Custom Trigger Implementation

2016-04-25 Thread Kostas Kloudas
Hi Piyush, In the onElement function, you register a timer every time you receive an element. When the next watermark arrives, in the flag==false case, this will lead to every element adding a timer for its timestamp+6ms. The same for flag==true case, with 2ms interval. What you

Re: Custom Trigger Implementation

2016-04-25 Thread Kostas Kloudas
Hi, Let me also add that you should also override the clear() method in order to clear you state. and delete the pending timers. Kostas > On Apr 25, 2016, at 11:52 AM, Kostas Kloudas <k.klou...@data-artisans.com> > wrote: > > Hi Piyush, > > In the onElement functi

Re: Dynamic partitioning for stream output

2016-05-24 Thread Kostas Kloudas
Hi Juho, If I understand correctly, you want a custom RollingSink that caches some buckets, one for each topic/date key, and whenever the volume of data buffered exceeds a limit, then it flushes to disk, right? If this is the case, then you are right that this is not currently supported

Re: stream keyBy without repartition

2016-05-24 Thread Kostas Kloudas
Hi Bart, From what I understand, you want to do a partial (per node) aggregation before shipping the result for the final one at the end. In addition, the keys do not seem to change between aggregations, right? If this is the case, this is the functionality of the Combiner in batch. In

Re: Dynamic partitioning for stream output

2016-05-25 Thread Kostas Kloudas
r partitioning comes from an event field, but > needs to be formatted, too. The partitioning feature should be generic, > allowing to pass a function that formats the bucket path for each tuple. > > Does it seem like a valid plan to create a sink that internally caches > multiple roll

Re: Create window before the first event

2016-07-12 Thread Kostas Kloudas
Hi Xiang, I think this is a duplicate from the discussion you opened yesterday. I post the same answer here, in case somebody wants to contribute to the discussion. According to your code, you just put all your elements (no splitting by key) into a single infinite window, and you apply your

Re: ContinuousProcessingTimeTrigger on empty

2016-07-12 Thread Kostas Kloudas
Hi Xiang, According to your code, you just put all your elements (no splitting by key) into a single infinite window, and you apply your window function every 5min (after the first element had arrived). The combination of the two means that if you have elements arriving at steady pace of 1

Re: Question about Apache Flink Use Case

2016-07-26 Thread Kostas Kloudas
Hi Suma Cherukuri, I also replied to your question in the dev list, but I repeat the answer here just in case you missed in. From what I understand you have many small files and you want to aggregate them into bigger ones containing the logs of the last 24h. As Max said RollingSinks will

Re: If I chain two windows, what event-time would the second window have?

2016-07-27 Thread Kostas Kloudas
Hi Yassine, When the WindowFunction is applied to the content of a window, the timestamp of the resulting record is the window.maxTimestamp, which is the endOfWindow-1. You can imaging if you have a Tumbling window from 0 to 2000, the result will have a timestamp of 1999. Window boundaries are

Re: No output when using event time with multiple Kafka partitions

2016-07-27 Thread Kostas Kloudas
Hi Yassine, Could you just remove the window and the apply, and just put a print() after the: > .assignTimestampsAndWatermarks(new AscendingTimestampExtractor() { > @Override > public long extractAscendingTimestamp(Request req) { > return req.ts; > } > }) This at least will

Re: Firing windows multiple times

2016-08-11 Thread Kostas Kloudas
Hi Shanon, From what I understand, you want to have your results windowed by different different durations, e.g. by minute, by day, by month and you use the evictor to decide which elements should go into each window. If I am correct, then I do not think that you need the evictor which bounds

Re: Firing windows multiple times

2016-08-11 Thread Kostas Kloudas
Just to add a drawback in solution 2) you may have some issues because window boundaries may not be aligned. For example the elements of a day window may be split between the last day of a month and the first of the next month. Kostas > On Aug 11, 2016, at 2:21 PM, Kostas Kloudas <

Re: flink - Working with State example

2016-08-11 Thread Kostas Kloudas
n type java.io.Serializable is not compatible with > java.lang.Double > [ERROR] > /home/buvana/flink/flink-1.1.0/wiki-edits/src/main/java/wikiedits/stateful.java:[150,9] > method does not override or implement a method from a supertype > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To s

Re: flink - Working with State example

2016-08-11 Thread Kostas Kloudas
p.f1 = value; >> >>prev_stored_tp.f0 = INPUT_KAFKA_TOPIC; >> >>prev_tuple.update(prev_stored_tp); >> >> >> >>Tuple2<String, Double> tp = new Tuple2<String, Double>(); >> >>

Re: Window not emitting output after upgrade to Flink 1.1.1

2016-08-12 Thread Kostas Kloudas
Hi Yassine, Are you reading from a file and use ingestion time? If yes, then the problem can be related to this: https://issues.apache.org/jira/browse/FLINK-4329 Is this the case? Best, Kostas > On Aug 12, 2016, at 10:30 AM, Yassine

Re: flink - Working with State example

2016-08-12 Thread Kostas Kloudas
No problem! Regards, Kostas > On Aug 12, 2016, at 3:00 AM, Ramanan, Buvana (Nokia - US) > <buvana.rama...@nokia-bell-labs.com> wrote: > > Kostas, > Good catch! That makes it working! Thank you so much for the help. > Regards, > Buvana > > -----Original Me

Re: Window function - iterator data

2016-08-10 Thread Kostas Kloudas
Hi Paul, Elements are returned in the order they were added in the window. No sorting on timestamp is performed. Hope this helps, Kostas > On Aug 9, 2016, at 10:22 PM, Paul Joireman wrote: > > When you are using a window function the docs: > >

Re: ValueState is missing

2016-08-11 Thread Kostas Kloudas
Hello, Could you share the code of the job you are running? With only this information I am afraid we cannot help much. Thanks, Kostas > On Aug 11, 2016, at 11:55 AM, Dong-iL, Kim wrote: > > Hi. > I’m using flink 1.0.3 on aws EMR. > sporadically value of ValueState is

Re: env.readFile with enumeratenestedFields

2016-07-20 Thread Kostas Kloudas
Hi Flavio, As Aljoscha pointed out the problem must be solved now. The changes are already in the master. If there is any issue let us know. Kostas > On Jul 20, 2016, at 6:29 PM, Aljoscha Krettek wrote: > > Hi, > the configuration has to be passed using >

Re: Record has Long.MIN_VALUE timestamp (= no timestamp marker). Is the time characteristic set to 'ProcessingTime', or did you forget to call 'DataStream.assignTimestampsAndWatermarks(...)'?

2016-07-08 Thread Kostas Kloudas
back to > processing time. > > On Fri, Jul 8, 2016 at 10:32 AM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.klou...@data-artisans.com>> wrote: > Can it be that when you define the ‘right’ steam, you do not specify a > timestamp extractor? > Thi

Re: Record has Long.MIN_VALUE timestamp (= no timestamp marker). Is the time characteristic set to 'ProcessingTime', or did you forget to call 'DataStream.assignTimestampsAndWatermarks(...)'?

2016-07-06 Thread Kostas Kloudas
Hi David, You are using Tumbling event time windows, but you set the timeCharacteristic to processing time. If you want processing time, then you should use TumblingProcessingTimeWindows and remove the timestampAssigner. If you want event time, then you need to set the timeCharacteristic to

Re: The two inputs have different execution contexts.

2016-07-11 Thread Kostas Kloudas
Hi Alieh, Could you share you code so that we can have a look? From the information you provide we cannot help. Thanks, Kostas > On Jul 10, 2016, at 3:13 PM, Alieh Saeedi wrote: > > I can not join or coGroup two tuple2 datasets of the same tome. The error is >

Re: sampling function

2016-07-11 Thread Kostas Kloudas
Hi Do, In DataStream you can always implement your own sampling function, hopefully without too much effort. Adding such functionality it to the API could be a good idea. But given that in sampling there is no “one-size-fits-all” solution (as not every use case needs random sampling and not

Re: The two inputs have different execution contexts.

2016-07-11 Thread Kostas Kloudas
No problem Alieh! Kostas > On Jul 11, 2016, at 11:46 AM, Alieh Saeedi <a1_sae...@yahoo.com> wrote: > > Hi > I was joining two datasets which were from two different > ExecutionEnviornment. It was my mistake. Thanks anyway. > > Best, > Alieh > > >

Re: Record has Long.MIN_VALUE timestamp (= no timestamp marker). Is the time characteristic set to 'ProcessingTime', or did you forget to call 'DataStream.assignTimestampsAndWatermarks(...)'?

2016-07-08 Thread Kostas Kloudas
aming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:225) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) > at java.lang.Thread.run(Thread.java:745) > > > On 06/07/2

Re: readFile - Continuous file processing

2017-01-31 Thread Kostas Kloudas
Hi Nancy, Currently there is no way to do so. Flink only provides the mode you described, i.e. a modified file is considered a new file. The reason is that many filesystems do not give you separate creation from modification timestamps. If you control the way files are created, a solution

Re: Count window on partition

2017-01-23 Thread Kostas Kloudas
Hi Dmitry, In all cases, the result of the countWindow will be also grouped by key because of the keyBy() that you are using. If you want to have a non-keyed stream and then split it in count windows, remove the keyBy() and instead of countWindow(), use countWindowAll(). This will have

Re: CEP and KeyedStreams doubt

2017-01-26 Thread Kostas Kloudas
Hi Oriol, The number of keys is related to the number of data-structures (NFAs) Flink is going to create and keep. Given this, it may make sense to try to reduce your key-space (or your keyedStreams). Other than that, Flink has not issue handling large numbers of keys. Now, for the issue you

Re: Deployment Architecture for Flink Applications

2017-02-22 Thread Kostas Kloudas
Hi CVP, On how people use Flink, you can check this blogpost to see how Alibaba does it: http://data-artisans.com/blink-flink-alibaba-search/ In addition, you can also find some more information on the matter on the talks from the last

Re: how to get rid of duplicate rows group by in DataStream

2016-08-22 Thread Kostas Kloudas
Hi Subash, You should also split your elements in windows. If not, Flink emits an element for each incoming record. That is why you have: (1,1) (1,2) (1,3) … Kostas > On Aug 22, 2016, at 5:58 PM, subash basnet wrote: > > Hello all, > > I grouped by the input based on

Re: How to share text file across tasks at run time in flink.

2016-08-22 Thread Kostas Kloudas
Hello Baswaraj, Are you using the DataSet (batch) or the DataStream API? If you are in the first, you can use a broadcast variable for your task. If you are using the DataStream one, then there

Re: Compression for AvroOutputFormat

2016-10-07 Thread Kostas Kloudas
Hi Lars, As far as I know there are no plans to do so in the near future, but every contribution is welcome. Looking forward to your Pull Request. Regards, Kostas > On Oct 7, 2016, at 12:40 PM, lars.bachm...@posteo.de wrote: > > Hi, > > at the moment it is not possible to set a compression

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-10-18 Thread Kostas Kloudas
be processed faster than others. This assumption only holds if you have a parallelism of 1. Cheers, Kostas > On Oct 18, 2016, at 11:05 AM, Kostas Kloudas <k.klou...@data-artisans.com> > wrote: > > Hi Maciek, > > Currently this functionality is not supported but this seems

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-10-18 Thread Kostas Kloudas
parallel... > > As for using notifyCheckpointComplete - thanks for suggestion, it looks > pretty interesting, I'll try to try it out. Although I wonder a bit if > relying only on modification timestamp is enough - many things may happen in > one ms :) > > thanks, > > macie

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-11-27 Thread Kostas Kloudas
t; > If you & other committers find these ideas ok, I can prepare jiras and pull > requests. While the first point is pretty straightforward IMHO, I'd like to > get some feedback one the second one. > > thanks, > maciek > > On 18/10/2016 11:52, Kostas Kloudas wrote:

Re: Problems with RollingSink

2016-11-28 Thread Kostas Kloudas
Hi Diego, The message shows that two tasks are trying to touch concurrently the same file. This message is thrown upon recovery after a failure, or at the initialization of the job? Could you please check the logs for other exceptions before this? Can this be related to this issue?

Re: Data Loss in HDFS after Job failure

2016-11-15 Thread Kostas Kloudas
Hi Dominique, Just wanted to add that the RollingSink is deprecated and will eventually be replaced by the BucketingSink, so it is worth migrating to that. Cheers, Kostas > On Nov 15, 2016, at 3:51 PM, Kostas Kloudas <k.klou...@data-artisans.com> > wrote: > > Hello Domin

Re: Data Loss in HDFS after Job failure

2016-11-15 Thread Kostas Kloudas
Hello Dominique, I think the problem is that you set both pending prefix and suffix to “”. Doing this makes the “committed” or “finished” filepaths indistinguishable from the pending ones. Thus they are cleaned up upon restoring. Could you undo this, and put for example a suffix “pending” or

Re: Additional steps needed for the Java quickstart guide

2016-11-16 Thread Kostas Kloudas
Hi Theodore, Thanks a lot for reporting this. It is true that many people have encountered it also during the training sessions. Cheers, Kostas > On Nov 16, 2016, at 2:49 PM, Theodore Vasiloudis > wrote: > > Hello all, > > I was preparing an exercise for

Re: Is incremental checkpointing already supported?

2016-11-16 Thread Kostas Kloudas
Hello, No, incremental checkpointing is not yet supported. Best, Kostas > On Nov 16, 2016, at 12:05 PM, 魏偉哲 wrote: > > Hi, > > Reply for the question below said the incremental checkpoint was not > implemented yet. >

Re: Data Loss in HDFS after Job failure

2016-11-15 Thread Kostas Kloudas
add this message. > > Greets > Dominique > > > > Von meinem Samsung Gerät gesendet. > > > ---- Ursprüngliche Nachricht > Von: Kostas Kloudas <k.klou...@data-artisans.com> > Datum: 15.11.16 15:51 (GMT+01:00) > An: user@flink.apache.or

Re: Flink on YARN - Fault Tolerance | use case supported or not

2016-10-31 Thread Kostas Kloudas
Hi Jatana, As you pointed out, the correct way to do the above is to use savepoints. If you kill your application, then this is not a crass but rather a voluntary action. I am also looping in Max, as he may have something more to say on this. Cheers, Kostas On Sat, Oct 29, 2016 at 12:13 AM,

Re: emit partial state in window (streaming)

2016-11-03 Thread Kostas Kloudas
Hi Luis, Can you try to comment the whole final windowing and see if this is works? This includes the following lines: .windowAll(TumblingEventTimeWindows.of(Time.of(windowTime, timeUnit))) .trigger(new PartialWindowTrigger<>(partialWindowTime, timeUnit, windowTime, timeUnit))

Re: emit partial state in window (streaming)

2016-11-03 Thread Kostas Kloudas
-fabric.com> > wrote: > > On Thu, Nov 3, 2016 at 2:06 PM, Kostas Kloudas <k.klou...@data-artisans.com> > wrote: > Hi Luis, > > Can you try to comment the whole final windowing and see if this is works? > This includes the following lines: > > .window

Re: About Sliding window

2016-10-11 Thread Kostas Kloudas
Hi Zhangrucong, Sliding windows only support time-based slide. So your use-case is not supported out-of-the-box. But, if you describe a bit more what you want to do, we may be able to find a way together to do your job using the currently offered functionality. Kostas > On Oct 11, 2016, at

Re: Allowed Lateness and Window State

2016-10-13 Thread Kostas Kloudas
Hi Seth, FIRE and FIRE_AND_PURGE still have the same meaning. So on eventTime, when your trigger says FIRE_AND_PURGE, your window will be evaluated (4) and its state will be purged. Now, when the red circle arrives, the state will be empty, so if the onElement says FIRE, the result will be 1.

Re: About Sliding window

2016-10-12 Thread Kostas Kloudas
e3 and e4, and > send the result. > > > I think I need a certain duration window. > > Thank you very much! > 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] > 发送时间: 2016年10月12日 21:11 > 收件人: Zhangrucong > 抄送: user@flink.apache.org > 主题: Re: About Slidin

Re: About Sliding window

2016-10-12 Thread Kostas Kloudas
ng as flowing: > > > We want flowing result: > > > > By the way, In StreamSQL API, in FILP11, It will realize row window. It seems > that the function of Slide Event-time row-window suits my use-case. Does data > stream API support row window? > > Thanks

Re: About Sliding window

2016-10-13 Thread Kostas Kloudas
, e2 is coming at 9:02, e3 is coming at > 9:07, and the aging time is 5 mins. So When e3 coming, e2 is aged. E2 is not > in the result! > > In the mail, you say you have discussion. Can you show me the link , I want > to take part in it. > > Best wishes! > > 发件人: Kostas

Re: Continuous File monitoring not reading nested files

2017-01-10 Thread Kostas Kloudas
w ArrayList(); > splitsByModTime.put(modTime, splitsToForward); > } > > ((List)splitsToForward).add(new > TimestampedFileInputSplit(modTime.longValue(), split.getSplitNumber(), > split.getPath(), split.getStart(), split.getLen

Re: Making batches of small messages

2017-01-12 Thread Kostas Kloudas
Hi, Fabian is right. The only thing I have to add is that if you have parallelism > 1 then each task will know its local “count” of messages it has buffered. In other words, with a parallelism of 2 and a batching threshold of 1000 messages, each one of the parallel tasks will have to reach

Re: Query regarding tumbling event time windows with ingestion time

2016-11-30 Thread Kostas Kloudas
ddy <janardhan.re...@olacabs.com> > wrote: > > HI > i didn't get it , can you please clarify with an example in case each of > operation A and B emit multiple elements. > > On Wed, Nov 30, 2016 at 3:34 PM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.

Re: Regarding windows and custom trigger

2016-11-30 Thread Kostas Kloudas
Hi Abdul, Probably the new enhanced evictors can help you do what you want. You can have a look here: https://cwiki.apache.org/confluence/display/FLINK/FLIP-4+%3A+Enhance+Window+Evictor and also in the related

Re: Problems with RollingSink

2016-11-29 Thread Kostas Kloudas
. > > Maybe there is a way to achieve this in a different manner by joining the > streams somehow before sinking… maybe through Kafka? > > Kind Regards, > > Diego > > >   <> > De: Kostas Kloudas [mailto:k.klou...@data-artisans.com] > Enviado el: lunes, 2

Re: Query regarding tumbling event time windows with ingestion time

2016-11-30 Thread Kostas Kloudas
Hi Janardhan, After the first windowing operation, the timestamp of the emitted element for each window will be the (endOfWindow - 1). So in your case, in the second windowing operation (window by 5) there will be at most one element per window. I hope this answers your question. Kostas >

Re: microsecond resolution

2016-12-05 Thread Kostas Kloudas
Hi Jeff, Actually in Flink timestamps are simple longs. This means that you can assign anything you want as a timestamp, as long as it fits in a long. Hope this helps and if not, we can discuss to see if we can find a solution that fits your needs together. Cheers, Kostas > On Dec 4, 2016,

Re: microsecond resolution

2016-12-05 Thread Kostas Kloudas
g/event_timestamps_watermarks.html#assigning-timestamps> > > "Both timestamps and watermarks are specified as milliseconds since the Java > epoch of 1970-01-01T00:00:00Z." > > > > On Mon, Dec 5, 2016 at 4:57 AM, Kostas Kloudas <k.klou...@data-artisans.com > <ma

Re: Query regarding tumbling event time windows with ingestion time

2016-11-30 Thread Kostas Kloudas
No problem! Kostas > On Nov 30, 2016, at 7:08 PM, Janardhan Reddy <janardhan.re...@olacabs.com> > wrote: > > That makes it clear. > > Thanks > > On Wed, Nov 30, 2016 at 10:15 PM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.klou...@data-art

Re: ContinuousFileMonitoringFunction - deleting file after processing

2016-12-01 Thread Kostas Kloudas
first point is pretty straightforward IMHO, I'd like to > get some feedback one the second one. > > thanks, > maciek > > On 18/10/2016 11:52, Kostas Kloudas wrote: >> Hi Maciek, >> >> I agree with you that 1ms is often too long :P >> >> This is the

Re: Regarding ordering of events

2017-01-05 Thread Kostas Kloudas
t;> wrote: > Flink is a distributed system and does not preserve order across partitions. > The number prefix (e.g., 1>, 2>, ...) tells you the parallel instance of the > printing operator. > > You can set the parallelism to 1 to have the stream in order. > > F

Re: Regarding ordering of events

2017-01-05 Thread Kostas Kloudas
Hi Abdul, Flink provides no ordering guarantees on the elements within a window. The only “order” it guarantees is that the results referring to window-1 are going to be emitted before those of window-2 (assuming that window-1 precedes window-2). Thanks, Kostas > On Jan 5, 2017, at 11:57 AM,

Re: Sequential/ordered map

2017-01-05 Thread Kostas Kloudas
Hi Sebastian, If T_1 must be processed before T_i, i>1, then you cannot parallelize the algorithm. If this is not a restriction, then you could; 1) split the text in words and also attach the id of the text they appear in, 2) do a groupBy that will send all the same words to the same node, 3)

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Kostas Kloudas
ly. I > don't mess with the input files there in any way. > > 3) The given example is run locally. In TextInputFormat.readRecord(String, > byte[], int, int) the nestedFileEnumeration parameter is true during > execution. Is this what you meant? > > Cheers, > Lukas > >&g

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Kostas Kloudas
Hi Lukas, Are you sure that the tempFile.deleteOnExit() does not remove the files before the test completes. I am just asking to be sure. Also from the code, I suppose that you run it locally. I suspect that the problem is in the way the input format scans nested files, but could you see if in

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Kostas Kloudas
Hi Yassine, I suspect that the problem is in the way the input format (and not the reader) scans nested files, but could you see if in the code that is executed by the tasks, the nestedFileEnumeration parameter is still true? I am asking in order to pin down if the problem is in the way we

Re: Flink CEP

2017-03-27 Thread Kostas Kloudas
Hi Daniel, The NOT operation is not yet supported in the CEP library but there is an open issue https://issues.apache.org/jira/browse/FLINK-3320 and we are working on integrating it in the next release of the CEP library. Please monitor the

Re: Cassandra Sink version

2017-03-22 Thread Kostas Kloudas
Hi Nancy, For both Flink 1.2 and Flink 1.3, our tests are written against Cassandra 2.2.5. We use the version 3.0 of this https://github.com/datastax/java-driver/tree/3.0.x driver. So please check there to see which Cassandra versions they

Re: AsyncFunction and Parallelism

2017-03-31 Thread Kostas Kloudas
Hi Nico, No, you can have as many parallel tasks doing async IO operations as you want. What the documentation says is that in each one of these tasks, there is one thread handling the requests. Hope this helps, Kostas > On Mar 31, 2017, at 12:17 PM, Nico wrote: >

Re: CEP timeout does not trigger under certain conditions

2017-04-18 Thread Kostas Kloudas
I just realized that the conversation was not sent to the Mailing List, so I am resending it. Kostas > On Apr 11, 2017, at 7:30 PM, vijayakumar palaniappan > <vijayakuma...@gmail.com> wrote: > > Sure Thanks > > On Tue, Apr 11, 2017 at 1:28 PM, Kostas Kloudas <

Re: Generate Timestamps and emit Watermarks - unordered events - Kafka source

2017-04-21 Thread Kostas Kloudas
Hi Luis and Aljoscha, In Flink-1.2 late events were not dropped, but they were processed as normal ones. This is fixed for Flink-1.3 with https://issues.apache.org/jira/browse/FLINK-6205 . I would recommend you to switch to the master branch

Re: ProcessFunction example

2017-03-09 Thread Kostas Kloudas
Hi Philippe, You are right! Thanks for reporting it! We will fix it asap. Kostas > On Mar 9, 2017, at 8:38 AM, Philippe Caparroy > wrote: > > I think there is an error in the code snippet describing the ProcessFunction > time out example : >

Re: Window Functions and Empty Panes

2017-04-18 Thread Kostas Kloudas
Hi Ryan, “A periodic window like this requires the ability to start a timer without an element and to restart a timer when fired.” For the second part, i.e. “to restart a timer when fired”, you can re-register the timer in the onTimer() method (set a new timer for “now + T"), so that the next

Re: Window Functions and Empty Panes

2017-04-18 Thread Kostas Kloudas
that Konstantin provided. There, the example uses a value state to hold the counter, you can do sth similar to keep the flag. Keep in mind that the state will already be scoped by key so you do not have to worry about that either. Kostas > On Apr 18, 2017, at 11:11 PM, Kostas Kloudas <k.klou..

Re: data loss after implementing checkpoint

2017-07-31 Thread Kostas Kloudas
Hi Sridhar, Stephan already covered the correct sequence of actions in order for your second program to know its correct starting point. As far as the active/inactive rules are concerned, as Nico pointed out you have to somehow store in the backend which rules are active and which are not

Re: [POLL] Dropping savepoint format compatibility for 1.1.x in the Flink 1.4.0 release

2017-08-02 Thread Kostas Kloudas
+1 > On Aug 2, 2017, at 3:16 PM, Till Rohrmann wrote: > > +1 > > On Wed, Aug 2, 2017 at 9:12 AM, Stefan Richter > wrote: > >> +1 >> >> Am 28.07.2017 um 16:03 schrieb Stephan Ewen : >> >> Seems like no one raised a concern

Re: Access to datastream from BucketSink

2017-08-16 Thread Kostas Kloudas
org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html > > <https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html> > > >>

Re: Access to datastream from BucketSink

2017-08-16 Thread Kostas Kloudas
Hi Ant, I think you can do it by implementing your own Bucketer. Cheers, Kostas . > On Aug 16, 2017, at 1:09 PM, ant burton wrote: > > Hello, > > Given > >// Set StreamExecutionEnvironment >final StreamExecutionEnvironment env = >

Re: Access to datastream from BucketSink

2017-08-16 Thread Kostas Kloudas
hat-I-need-create-from-the-stream"); > } > } > > my question now is how do I access the data stream from within the S3Bucketer > so that I can generate a filename based on the data with the data stream. > > Thanks, > >> On 16 Aug 2017, at 12:55, Kostas

Re: FsStateBackend with incremental backup enable does not work with Keyed CEP

2017-08-12 Thread Kostas Kloudas
Hi Daiqing, I think Stefan is right and this will be fixed in the upcoming release. Could you open a JIRA for it with the Exception that you posted here? Thanks, Kostas > On Aug 12, 2017, at 10:05 AM, Stefan Richter > wrote: > > Hi, > > from a quick look, I

Re: CEP join across events

2017-04-26 Thread Kostas Kloudas
Hi Elias, If I understand correctly your use case, you want for an input: event_1 = (type=1, value_a=K, value_b=X) event_2 = (type=2, value_a=K, value_b=X) event_3 = (type=1, value_a=K, value_b=Y) to get a match: event_1, event_2 and discard event_3, right? In this case, Dawid is correct and

Re: CEP join across events

2017-04-27 Thread Kostas Kloudas
a enumerator of one, which > is the default. > > > On Wed, Apr 26, 2017 at 2:15 AM, Kostas Kloudas <k.klou...@data-artisans.com > <mailto:k.klou...@data-artisans.com>> wrote: > Hi Elias, > > If I understand correctly your use case, you want for an input: >

Re: Iterating over keys in state backend

2017-04-27 Thread Kostas Kloudas
Hi Ken, Unfortunately, iterating over all keys is not currently supported. Do you have your own custom operator (because you mention “from within the operator…”) or you have a process function (because you mention the “onTimer” method)? Also, could you describe your use case a bit more? You

Re: Generate Timestamps and emit Watermarks - unordered events - Kafka source

2017-04-25 Thread Kostas Kloudas
Perfect! Thanks a lot for testing it Luis! And keep us posted if you find anything else. As you may have seen the CEP library is undergoing heavy refactoring for the upcoming release. Kostas > On Apr 25, 2017, at 12:30 PM, Luis Lázaro wrote: > > Hi Aljoscha and Kostas,

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
Yes this is the master branch. We have not yet forked the 1.3 branch. And I do not think there is a better way and I am not sure if there can be. Apart from the memory leak that is described in the JIRA, the different NFA’s cannot share any state, so for each one the associated memory

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
Perfect! And let us know how it goes! Kostas > On Apr 28, 2017, at 5:04 PM, mclendenin wrote: > > Ok, I will try using Flink 1.3 > > > > -- > View this message in context: >

Re: Iterating over keys in state backend

2017-04-28 Thread Kostas Kloudas
with timestamps smaller than the watermark are processed. Hope this helps, Kostas > On Apr 28, 2017, at 4:08 AM, Ken Krugler <kkrugler_li...@transpac.com> wrote: > > Hi Kostas, > > Thanks for responding. Details in-line below. > >> On Apr 27, 2017, at 1:19am, Kostas Klouda

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
Hi! I suppose that by memory errors you mean you run out of memory, right? Are you using Flink 1.2 or the current master (upcoming Flink 1.3). The reason I am asking is because Flink 1.2 suffered from this https://issues.apache.org/jira/browse/FLINK-5174

Re: Multiple CEP Patterns

2017-04-28 Thread Kostas Kloudas
28, 2017, at 9:44 AM, Kostas Kloudas <k.klou...@data-artisans.com> > wrote: > > Hi! > > I suppose that by memory errors you mean you run out of memory, right? > > Are you using Flink 1.2 or the current master (upcoming Flink 1.3). > The reason I am asking i

Re: CEP join across events

2017-04-28 Thread Kostas Kloudas
you could do something like: > > Pattern. > .begin[Foo]("first") > .where( first => first.baz == 1 ) > .followedBy("next") > .relatedTo("first", { (first, next) => first.bar == next.bar }) > .

  1   2   3   4   >