Re: NotSerializableException

2016-06-13 Thread Aljoscha Krettek
Nope, I think there is neither a fix nor an open issue for this right now. On Mon, 13 Jun 2016 at 11:31 Maximilian Michels wrote: > Is there an issue or a fix for proper use of the ClojureCleaner in > CoGroup.where()? > > On Fri, Jun 10, 2016 at 8:07 AM, Aljoscha Krettek >

Re: Accessing StateBackend snapshots outside of Flink

2016-06-13 Thread Aljoscha Krettek
Serializer and read the "last_updated" > property > > - delete the key from RocksDB if the state's "last_updated" property is > over > > a month ago > > > > Is there any reason this approach wouldn't work, or anything to be > careful &g

Re: Custom Barrier?

2016-06-14 Thread Aljoscha Krettek
Hi, would these super-structure events occur per key? If yes, then I think you can process this using the currently available windowing mechanism by writing a custom WindowAssigner and Trigger. This, of course, assumes that the events arrive in-order, i.e. if A-End arrives before A-Start or if elem

Re: Custom Barrier?

2016-06-15 Thread Aljoscha Krettek
order > On 14 Jun 2016 14:04, "Paul Wilson" wrote: > >> Hi, >> >> No these super-structure events only serve the purpose of defining the >> boundaries of a join, and do not relate to the keys of the sub-events. >> >> Thanks, >> Paul >&g

Re: Migrating from one state backend to another

2016-06-15 Thread Aljoscha Krettek
Hi, right now migrating from one state backend to another is not possible. I have it in the back of my head, however, that we should introduce a common serialized representation of state to make this possible in the future. (Both for checkpoints and savepoints, which use the same mechanism undernea

Re: Documentation for translation of Job graph to Execution graph

2016-06-17 Thread Aljoscha Krettek
Hi, I'm afraid there is no documentation besides the link that you posted and this one: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html . With what parts are you having trouble? Maybe I can help. Cheers, Aljoscha On Thu, 16 Jun 2016 at 19:31 Bajaj, Abhinav wro

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-17 Thread Aljoscha Krettek
Hi, I think the problem with the missing Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem is not specific to RocksDB. The exception is thrown in the FsStateBackend, which is internally used by the RocksDB backend to do snapshotting of non-partitioned state. The problem is that the FsStateBackend tri

Re: Lazy Evaluation

2016-06-21 Thread Aljoscha Krettek
Great to hear that it works now! :-) On Sun, 19 Jun 2016 at 16:33 Paschek, Robert wrote: > Hi Mailing List, > > after "upgrading" the flink version in my pom.xml to 1.0.3, i get two > error messages for these output variants, which don't work: > > org.apache.flink.api.common.functions.InvalidTyp

Re: TaskManager information during runtime

2016-06-22 Thread Aljoscha Krettek
Hi, you can implement a RichSourceFunction. With this you can also implement open() and close() methods that get called when your source is started on the worker node and closed respectively. In there, you could determine the hostname and send it to some centralized service so that it knows the hos

Re: Code related to spilling data to disk

2016-06-22 Thread Aljoscha Krettek
Hi, Chiwan is correct. The reason why we're (not yet) using managed memory in the streaming API (DataStream) is that it was easier to get things up and running by just using JVM heap. We're hoping to change this at some point in the future, though. Cheers, Aljoscha On Wed, 22 Jun 2016 at 14:05 Ch

Re: Kinesis connector classpath issue when running Flink 1.1-SNAPSHOT on YARN

2016-06-23 Thread Aljoscha Krettek
structor, since it seems to get > initialised later anyway) > > Josh > > > > On Fri, Jun 17, 2016 at 2:53 PM, Aljoscha Krettek > wrote: > >> Hi, >> I think the problem with the missing Class >> com.amazon.ws.emr.hadoop.fs.EmrFileSystem is not specific to

Re: Way to hold execution of one of the map operator in Co-FlatMaps

2016-06-27 Thread Aljoscha Krettek
Hi, the two map functions are called by the same thread, so waiting in one function would block all processing. What you could do is buffer elements from one input and only process them when an element arrives on the other input. Cheers, Aljoscha On Sun, 26 Jun 2016 at 13:36 Biplob Biswas wrote:

Re: Way to hold execution of one of the map operator in Co-FlatMaps

2016-06-27 Thread Aljoscha Krettek
Maybe. But how do you mean, exactly? On Mon, 27 Jun 2016 at 11:14 Janardhan Reddy wrote: > Hi, > > Instead of buffering can we use event creation time and watermarks ? > > On Mon, Jun 27, 2016 at 2:32 PM, Aljoscha Krettek > wrote: > >> Hi, >> the two map functi

Re: Scala/ReactiveMongo: type classes, macros and java.util.Serializable

2016-06-27 Thread Aljoscha Krettek
Hi, could you maybe write TypeInformation/TypeSerializer wrappers that lazily instantiate a type class-based serializer. It might even work using a "lazy val". Something like this: class ScalaTypeSerializer[T] extends TypeSerializer[T] { lazy val serializer = "create the scala serializer" ...

Re: Way to hold execution of one of the map operator in Co-FlatMaps

2016-06-28 Thread Aljoscha Krettek
Hi, I might lead to flooding, yes. But I'm afraid it's the only way to go right now. Cheers, Aljoscha On Mon, 27 Jun 2016 at 17:57 Biplob Biswas wrote: > Hi, > > I was afraid of buffering because I am not sure when the second map > function > would get data, wouldn't the first map be flooded wi

Re: maximum size of window

2016-06-28 Thread Aljoscha Krettek
Hi, one thing to add: if you use a ReduceFunction or a FoldFunction for your window the state will not grow with bigger window sizes or larger numbers of elements because the result is eagerly computed. In that case, state size is only dependent on the number of individual keys. Cheers, Aljoscha

Re: Optimizations not performed - please confirm

2016-06-29 Thread Aljoscha Krettek
Hi, I think this document is still up-to-date since not much was done in these parts of the code for the 1.0 release and after that. Maybe Timo can give some insights into what optimizations are done in the Table API/SQL that will be be released in an updated version in 1.1. Cheers, Aljoscha +Ti

Re: maximum size of window

2016-06-29 Thread Aljoscha Krettek
> windows) ? > > Thanks and Regards, > Vishnu Viswanath, > > On Tue, Jun 28, 2016 at 5:04 AM, Aljoscha Krettek > wrote: > >> Hi, >> one thing to add: if you use a ReduceFunction or a FoldFunction for your >> window the state will not grow with bigger window

Re: Best way to read property file in flink

2016-06-29 Thread Aljoscha Krettek
Hi, could you load the properties file when starting the application and add it to the user functions so that it would be serialized along with them? This way, you wouldn't have to ship the file to each node. Cheers, Aljoscha On Wed, 29 Jun 2016 at 12:09 Janardhan Reddy wrote: > We are running

Re: How to count number of records received per second in processing time while using event time characteristic

2016-06-29 Thread Aljoscha Krettek
Hi, you can explicitly specify that you want processing-time windows like this: stream.keyBy(...).window(TumblingProcessingTimeWindows.of(Time.seconds(1))).sum(...) Also note that the timestamp you append in "writeAsCsv("records-per-second-" + System.currentTimeMillis())" will only take the times

Re: Question regarding logging capabilities in flink

2016-06-30 Thread Aljoscha Krettek
Hi, I think there is no way to get the output from these log statements into the Yarn logs. The reason is that this code is only executed on the client and not in any Yarn context/container. This code is setting up everything for Yarn and then control is handed over so it is executed before the Job

Re: How to avoid breaking states when upgrading Flink job?

2016-06-30 Thread Aljoscha Krettek
Hi Josh, I think in your case the problem is that Scala might choose different names for synthetic/generated classes. This will trip up the code that is trying to restore from a snapshot that was done with an earlier version of the code where classes where named differently. I'm afraid I don't kno

Re: Checkpointing very large state in RocksDB?

2016-06-30 Thread Aljoscha Krettek
Hi, are you taking about *enableFullyAsyncSnapshots()* in the RocksDB backend. If not, there is this switch that is described in the JavaDoc: /** * Enables fully asynchronous snapshotting of the partitioned state held in RocksDB. * * By default, this is disabled. This means that RocksDB state is c

Re: How to avoid breaking states when upgrading Flink job?

2016-06-30 Thread Aljoscha Krettek
hen the class loader should be able to > deserialize your serialized data. > > Cheers, > Till > > On Thu, Jun 30, 2016 at 1:55 PM, Aljoscha Krettek > wrote: > >> Hi Josh, >> I think in your case the problem is that Scala might choose different >> names for s

Re: Flink streaming connect and split streams

2016-07-01 Thread Aljoscha Krettek
Hi, I'm afraid the only way to do it right now is using the wrapper that can contain both, as you suggested. Cheers, Aljoscha On Thu, 30 Jun 2016 at 16:50 Martin Neumann wrote: > Hej, > > I'm currently playing around with some machine learning algorithms in > Flink streaming. > > I have an inpu

Re: How to avoid breaking states when upgrading Flink job?

2016-07-01 Thread Aljoscha Krettek
ger.Task.run(Task.java:546) > > > On Fri, Jul 1, 2016 at 10:21 AM, Josh wrote: > >> Thanks guys, that's very helpful info! >> >> @Aljoscha I thought I saw this exception on a job that was using the >> RocksDB state backend, but I'm not sure. I will

Re: Different results on local and on cluster

2016-07-01 Thread Aljoscha Krettek
Hi, do you have any data in the coGroup/groupBy operators that you use, besides the input data? Cheers, Aljoscha On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier wrote: > Hi to all, > I have a Flink job that computes data correctly when launched locally from > my IDE while it doesn't when launche

Re: Tumbling time window cannot group events properly

2016-07-04 Thread Aljoscha Krettek
Hi, I think it should be as simple as setting event time as the stream time characteristic: env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) The problem is that .timeWindow(Time.seconds(10)) will use processing time if you don't specify a time characteristic. You can enforce using an

Re: Tumbling time window cannot group events properly

2016-07-04 Thread Aljoscha Krettek
0:50 CST > 100 events in this window > Mon, 04 Jul 2016 19:10:50 CST > Mon, 04 Jul 2016 19:10:50 CST > Mon, 04 Jul 2016 19:10:50 CST > Mon, 04 Jul 2016 19:10:50 CST > Mon, 04 Jul 2016 19:10:51 CST > Mon, 04 Jul 2016 19:10:51 CST > > > On 4 July 2016 at 16:15, Aljoscha Kre

Re: Tumbling time window cannot group events properly

2016-07-05 Thread Aljoscha Krettek
indow" indicating the end of the window. > > The 10s windows should be [19:10:40, 19:10:49], [19:10:50, 19:10:59], ..., > but in the example above, the events at 19:10:50, which belong to > [19:10:50, 19:10:59] window were mistakenly put in the [19:10:40, 19:10:49] > one. > > On

Re: Checkpointing very large state in RocksDB?

2016-07-06 Thread Aljoscha Krettek
s really matter since it is async > anyways? > > Thanks and Regards, > Vishnu Viswanath, > > On Thu, Jun 30, 2016 at 8:07 AM, Aljoscha Krettek > wrote: > >> Hi, >> are you taking about *enableFullyAsyncSnapshots()* in the RocksDB >> backend. If not, there is

Re: [DISCUSS] Allowed Lateness in Flink

2016-07-06 Thread Aljoscha Krettek
don't give enough space for expressing more complex opinions. Cheers, Aljoscha On Mon, 30 May 2016 at 11:23 Aljoscha Krettek wrote: > Thanks for the feedback! :-) I already read the comments on the file. > > On Mon, 30 May 2016 at 11:10 Gyula Fóra wrote: > >> Tha

Re: Adding and removing operations after execute

2016-07-08 Thread Aljoscha Krettek
This Blog post goes into the direction of what Jamie suggested: https://techblog.king.com/rbea-scalable-real-time-analytics-king/ The folks at King developed a system where users can dynamically inject scripts written in Groovy into a running general-purpose Flink job. On Thu, 7 Jul 2016 at 20:34

Re: custom scheduler in Flink?

2016-07-13 Thread Aljoscha Krettek
Hi, I'm afraid there is no documentation about schedulers, especially at this low level. Maybe this new design proposal would of interest for you, though: https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures In there is a link to the mailing list dis

Re: HDFS to Kafka

2016-07-13 Thread Aljoscha Krettek
Hi, this does not work right now because FileInputFormat does not allow setting the "enumerateNestedFiles" field directly and the Configuration is completely ignored in Flink streaming jobs. Cheers, Aljoscha On Wed, 13 Jul 2016 at 11:06 Robert Metzger wrote: > Hi Dominique, > > In Flink 1.1 we'

Re: In AbstractRocksDBState, why write a byte 42 between key and namespace?

2016-07-15 Thread Aljoscha Krettek
I left that in on purpose to protect against cases where the combination of key and namespace can be ambiguous. For example, these two combinations of key and namespace have the same written representation: key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) key [0 1] namespace [2 3

Re: Unable to get the value of datatype in datastream

2016-07-19 Thread Aljoscha Krettek
Hi, you have to ensure to filter the data that you send back on the feedback edge, i.e. the loop.closeWith(newCentroids.broadcast()); statement needs to take a stream that only has the centroids that you want to send back. And you need to make sure to emit centroids with a good timestamp if you wan

Re: Aggregate events in time window

2016-07-20 Thread Aljoscha Krettek
Which is of course only available in 1.1-SNAPSHOT or the upcoming 1.1 release. :-) On Tue, 19 Jul 2016 at 22:32 Till Rohrmann wrote: > Hi Dominique, > > your problem sounds like a good use case for session windows [1, 2]. If > you know that there is only a maximum gap between your request and re

Re: Flink Dashboard stopped showing list of uploaded jars

2016-07-20 Thread Aljoscha Krettek
Hi, in the JobManager log there should be a line like this: 2016-07-20 17:19:00,552 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /some/dir for web frontend JAR file uploads if you manually delete the offending jar file from that directory it could solve your problem

Re: env.readFile with enumeratenestedFields

2016-07-20 Thread Aljoscha Krettek
Hi, the configuration has to be passed using env.readFile(...).withParameters(ifConf). The InputFormat will then be properly configured at runtime. However, Kostas just enhanced the FileInputFormats to allow setting the parameters directly on the input format. In 1.1-SNAPSHOT and the upcoming 1.1

Re: Processing windows in event time order

2016-07-21 Thread Aljoscha Krettek
Hi David, windows are being processed in order of their end timestamp. So if you specify an allowed lateness of zero (which will only be possible on Flink 1.1 or by using a custom trigger) you should be able to sort the elements. The ordering is only valid within one key, though, since windows for

Re: Processing windows in event time order

2016-07-21 Thread Aljoscha Krettek
ag > the watermarks by 20 seconds, then only one instance of the Window (1-5) > fires with elements A,B,C,D. > > Sameer > > On Thu, Jul 21, 2016 at 5:17 AM, Aljoscha Krettek > wrote: > >> Hi David, >> windows are being processed in order of their end timestamp.

Re: Processing windows in event time order

2016-07-22 Thread Aljoscha Krettek
t; other sources (except one) would keep sending data and their watermarks. > Isn't this a risk for a possible Out of Memory Error. Should one always use > a RocksDB alternative to mitigate such risks. > > Sameer > > > > On Thu, Jul 21, 2016 at 7:52 AM, Aljoscha Krettek

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-07-25 Thread Aljoscha Krettek
Looping in Max directly because he probably knows the Yarn stuff best. @Max: Do you have any idea how to do this? On Fri, 22 Jul 2016 at 05:46 김동일 wrote: > I’saw the source code > of > flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java > Flink ships the FLINK_LIB

Re: tumbling time window, date boundary and timezone

2016-07-29 Thread Aljoscha Krettek
Hi, yes, I'm afraid you would have to use a custom version of the TumblingProcessingTimeWindows right now. I've opened a Jira issue for adding an offset setting to the built-in window assigners: https://issues.apache.org/jira/browse/FLINK-4282 Cheers, Aljoscha On Tue, 26 Jul 2016 at 12:51 Hirono

Re: stop then start the job, how to load latest checkpoints automatically?

2016-07-29 Thread Aljoscha Krettek
Hi, you can perform a savepoint without changing your jar, yes. Automatically taking the latest checkpoint as a savepoint is not possible right now. We are working on adding support for that, however. Cheers, Aljoscha On Tue, 26 Jul 2016 at 20:28 Shaosu Liu wrote: > I want to load previous sta

Re: No output when using event time with multiple Kafka partitions

2016-07-29 Thread Aljoscha Krettek
Hi, when running in local mode the default parallelism is always the number of (possibly virtual) CPU cores. The parallelism of the sink is set before it gets a chance to find out how many Kafka partitions there are. I think the reason for the behavior you're observing is that only one of your two

Re: Reprocessing data in Flink / rebuilding Flink state

2016-07-29 Thread Aljoscha Krettek
Hi, I think the exact thing you're trying do do is not possible right now but I know of a workaround that some people have used. For "warming up" the state from the historical data, you would run your regular Flink job but replace the normal Kafka source by a source that reads from the historical

Re: [VOTE] Release Apache Flink 1.1.0 (RC1)

2016-07-29 Thread Aljoscha Krettek
When running "mvn clean verify" with Hadoop version 2.6.1 the Zookeeper/Leader Election tests fail with this: java.lang.NoSuchMethodError: org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)Ljava/lang/String; at org.apache.curator.framework.imps.NamespaceImpl.(NamespaceImpl.java:37

Re: Reprocessing data in Flink / rebuilding Flink state

2016-07-29 Thread Aljoscha Krettek
Hi, I have to try this to verify but I think the approach works if you give the two sources different UIDs. The reason is that Flink will ignore state for which it doesn't have an operator to assign it to. Therefore, the state of the "historical Kafka source" should be silently discarded. Cheers,

Re: TimeWindowAll doeesn't assign properly

2016-07-29 Thread Aljoscha Krettek
Hi, the single-element-windows to me indicate that these originate from elements that arrived at the window operator after the watermark. In the current version of Flink these elements will be emitted as a single-element window. You can avoid this by writing a custom EventTimeTrigger that does not

Re: Ability to partition logs per pipeline

2016-07-31 Thread Aljoscha Krettek
Hi, I'm afraid that's not possible right now. The preferred way of running would be to have a Yarn cluster per job, that way you can isolate the logs. Cheers, Aljoscha On Thu, 14 Jul 2016 at 09:49 Chawla,Sumit wrote: > Hi Robert > > I actually mean both. Scenarios where multiple jobs are runni

Re: TimeWindowAll doeesn't assign properly

2016-08-01 Thread Aljoscha Krettek
Hi, yes, if you set the delay to high you will have to wait a long time until your windows are emitted. Cheers, Aljoscha On Mon, 1 Aug 2016 at 04:52 Sendoh wrote: > Probably `processAt` is not used adequately because after increasing > maxDelay > in watermark to 10 minutes it works as expected.

Re: Reprocessing data in Flink / rebuilding Flink state

2016-08-01 Thread Aljoscha Krettek
recover from removed > tasks/operators without needing to add dummy operators like this. > > Josh > > On Fri, Jul 29, 2016 at 5:46 PM, Aljoscha Krettek > wrote: > >> Hi, >> I have to try this to verify but I think the approach works if you give >> the two so

Re: CEP and Within Clause

2016-08-01 Thread Aljoscha Krettek
+Till, looping him in directly, he probably missed this because he was away for a while. On Tue, 26 Jul 2016 at 18:21 Sameer W wrote: > Hi, > > It looks like the WithIn clause of CEP uses Tumbling Windows. I could get > it to use Sliding windows by using an upstream pipeline which uses Sliding

Re: Parallel execution on AllWindows

2016-08-03 Thread Aljoscha Krettek
Hi, if you manually force a parallelism different from 1 after a *windowAll() then you will get parallel execution of your window. For example, if you do this: input.countWindowAll(100).setParallelism(5) then you will get five parallel instances of the window operator that each wait for 100 eleme

Re: Parallel execution on AllWindows

2016-08-03 Thread Aljoscha Krettek
indow? > > Thanks > > Andrew > From mobile > > From: Aljoscha Krettek > Sent: Wednesday, August 3, 17:11 > Subject: Re: Parallel execution on AllWindows > To: user@flink.apache.org > > Hi, > > if you manually force a parallelism different from 1 after a *windowAl

Re: Generate timestamps in front of event for event time windows

2016-08-03 Thread Aljoscha Krettek
Hi, a watermark cannot be sent before the element that makes you send that watermark. A watermark of time T tells the system that no element will arrive in the future with timestamp T or less, thus you cannot send it before. It seems that what you are trying to achieve can be solved by using sessio

Re: Flink timestamps

2016-08-08 Thread Aljoscha Krettek
Hi Davood, right now, you can only inspect the timestamps by writing a custom operator that you would use with DataStream.transform(). Measuring latency this way has some pitfalls, though. The timestamp might be assigned on a different machine than the machine that will process the tuple at the sin

Re: Output of KeyBy->TimeWindow->Reduce?

2016-08-08 Thread Aljoscha Krettek
Hi, what does the reduce function do exactly? Something like this? (a: String, b: String) -> b.toUppercase If yes, then I would expect a) to be the output you get. if it is this: (a: String, b: String) -> a + b.toUppercase then I would expect this: a,b,cC,d,eE,f,gG,h Cheers, Aljoscha On Sun,

Re: Flink 1.1 event-time windowing changes from 1.0.3

2016-08-08 Thread Aljoscha Krettek
Hi Adam, sorry for the inconvenience. This is caused by a new file read operator, specifically how it treats watermarks/timestamps. I opened an issue here that describes the situation: https://issues.apache.org/jira/browse/FLINK-4329. I think this should be fixed for an upcoming 1.1.1 bug fixing r

Re: Release notes 1.1.0?

2016-08-09 Thread Aljoscha Krettek
Hi, could you maybe post how exactly you specify the window? Also, did you set a "stream time characteristic", for example EventTime? That could help us pinpoint the problem. Cheers, Aljoscha On Tue, 9 Aug 2016 at 12:42 Andrew Ge Wu wrote: > I rolled back to 1.0.3 > If I understand this correc

Re: Release notes 1.1.0?

2016-08-09 Thread Aljoscha Krettek
g something out of ordinary here. > > > > Thanks! > > > Andrew > > On 09 Aug 2016, at 14:18, Aljoscha Krettek wrote: > > Hi, > could you maybe post how exactly you specify the window? Also, did you set > a "stream time characteristic", for example Even

Re: Window function - iterator data

2016-08-10 Thread Aljoscha Krettek
Hi, Kostas is right in that the elements are never explicitly sorted by timestamp. In some cases they might not even be iterated in the order that they were added so I would normally assume the order of the elements to be completely arbitrary. Cheers, Aljoscha On Wed, 10 Aug 2016 at 09:44 Kostas

Re: Release notes 1.1.0?

2016-08-10 Thread Aljoscha Krettek
that part, no, Im not explicitly set that. > > > On 09 Aug 2016, at 15:29, Aljoscha Krettek wrote: > > Hi, > are you setting a StreamTimeCharacteristic, i.e. > env.setStreamTimeCharacteristic? > > Cheers, > Aljoscha > > On Tue, 9 Aug 2016 at 14:52 Andrew Ge W

Re: Connected Streams - Controlling Order of arrival on the two streams

2016-08-10 Thread Aljoscha Krettek
Hi, I'm afraid you guessed correctly that it is not possible to ensure that rules arrive before events. I think the way you solved it (with buffering) is the correct way to go about this. Cheers, Aljoscha On Wed, 10 Aug 2016 at 01:31 Sameer W wrote: > Hi, > > I am using connected streams to sen

Re: Release notes 1.1.0?

2016-08-10 Thread Aljoscha Krettek
Oh, are you by any chance specifying a custom state backend for your job? For example, RocksDBStateBackend. Cheers, Aljoscha On Wed, 10 Aug 2016 at 11:17 Aljoscha Krettek wrote: > Hi, > could you maybe send us the output of "env.getExecutionPlan()". This would > help us bett

Re: Firing windows multiple times

2016-08-10 Thread Aljoscha Krettek
Hi, from your mail I'm gathering that you are in fact using an Evictor, is that correct? If not, then the window operator should not keep all the elements ever received for a window but only the aggregated result. Side note, there seems to be a bug in EvictingWindowOperator that causes evicted ele

Re: Flink : CEP processing

2016-08-11 Thread Aljoscha Krettek
Hi, Sameet is right about the snapshotting. The CEP operator behaves more or less like a FlatMap operator that keeps some more complex state internally. Snapshotting works the same as with any other operator. Cheers, Aljoscha On Thu, 11 Aug 2016 at 00:54 Sameer W wrote: > Mans, > > I think at t

Re: Firing windows multiple times

2016-08-11 Thread Aljoscha Krettek
ermark, trigger's event timer is reached, >fires and purges and emits current state as event z(time=1, count=2) >9. Window B receives event, trigger waits for processing time delay, >then executes fold() and emits event(time=1 => count=2), but internal >Window stat

Re: Does Flink DataStreams using combiners?

2016-08-12 Thread Aljoscha Krettek
Hi, Sameer is right that Flink currently does not combine for any combination of assigner, trigger and window function. Technically, it would be possible to use a combiner for Triggers that don't observe individual elements but only fire on time. With triggers that observe elements, such as CountT

Re: Firing windows multiple times

2016-08-12 Thread Aljoscha Krettek
Hi, there is already this FLIP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-4+%3A+Enhance+Window+Evictor which also links to a mailing list discussion. And this FLIP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata. The former proposes to enhan

Re: flink shaded jar in yarn

2016-08-12 Thread Aljoscha Krettek
Thanks for letting us know! On Thu, 11 Aug 2016 at 07:33 Janardhan Reddy wrote: > sorry my bad, i was using some other version. > > > > On Thu, Aug 11, 2016 at 4:47 AM, Janardhan Reddy < > janardhan.re...@olacabs.com> wrote: > >> Hi, >> >> the flink-dist_2.11-1.0.0.jar jar present in lib folder

Re: Updating stored window data

2016-08-16 Thread Aljoscha Krettek
Hi, the input elements to a window function should not be modified. Could you maybe achieve something using a Fold? Maybe if you went into a bit more details we could figure something out together. Cheers, Aljoscha On Tue, 16 Aug 2016 at 10:38 Ufuk Celebi wrote: > Hey Paul! I think the window c

Re: flink-shaded-hadoop

2016-08-23 Thread Aljoscha Krettek
Hi, this might be due to a bug in the Flink 1.1.0 maven dependencies. Can you try updating to Flink 1.1.1? Cheers, Aljoscha On Mon, 22 Aug 2016 at 07:48 wrote: > Hi, > every one , when i use scala version 2.10,and set the sbt project(add > those:flink-core,flink-scala,flink-streaming-scala,

Re: flink - Working with State example

2016-08-24 Thread Aljoscha Krettek
Hi, you mean the directory is completely empty? Can you check in the JobManager dashboard whether it reports any successful checkpoints for the job? One possible explanation is an optimization that the FsStateBackend performs: when the state is very small it will not actually be written to files bu

Re: Firing windows multiple times

2016-08-29 Thread Aljoscha Krettek
ark to the window > function metadata in FLIP-2? > > From: Shannon Carey > Date: Friday, August 12, 2016 at 6:24 PM > To: Aljoscha Krettek , "user@flink.apache.org" < > user@flink.apache.org> > > Subject: Re: Firing windows multiple times > > Thanks Al

Re: Accessing state in connected streams

2016-08-30 Thread Aljoscha Krettek
Hi Aris, I think you're on the right track with using a CoFlatMap for this. Could you maybe post the code of your CoFlatMapFunction (or you could send it to me privately if you have concerns with publicly posting it) then I could have a look. Cheers, Aljoscha On Mon, 29 Aug 2016 at 15:48 aris kol

Re: Firing windows multiple times

2016-08-30 Thread Aljoscha Krettek
rent watermark) on an ongoing basis. > The windowing function would be responsible for evicting old data based on > the current watermark. > > Does that make sense? Does it seem logical, or am I misunderstanding > something about how Flink works? > > -Shannon > > > Fr

Re: Accessing state in connected streams

2016-08-30 Thread Aljoscha Krettek
>} > >flatMap(in, out) > > } > } > } > > > applyWithState throws the exception and my intuition says I am doing > seriously wrong in the instantiation. I tried to make something work using > the mapWithState implementation as a guide and I ended up here. > &g

Re: Gracefully Stopping Streaming Job Programmatically in LocalStreamEnvironment

2016-08-31 Thread Aljoscha Krettek
Hi Konstantin, I think this is not possible with the current API but I've been thinking about similar stuff this week. Let me quickly outline what I was thinking and then you can tell me whether that would also be helpful for you. The basic problem is this: I want to be able to write ITCases that

Re: Setting EventTime window width using stream data

2016-08-31 Thread Aljoscha Krettek
Just checking, all the elements that would fall into a window of length X also have X as a property? In that case you should be able to do something like this: public Collection assignWindows(PojoType element, long timestamp, WindowAssignerContext context) { long size = element.windowSize;

Re: Windows and Watermarks Clarification

2016-09-01 Thread Aljoscha Krettek
Just one clarification: even with a specified allowed lateness the window will still be evaluated once the watermark passes the end of the window. It's just that with allowed lateness the window contents and state will be kept around a bit longer to allow eventual late elements to update the result

Re: Firing windows multiple times

2016-09-02 Thread Aljoscha Krettek
ll > be on average 0.5 months stale. A year-long window is even worse. > > -Shannon > > From: Aljoscha Krettek > Date: Tuesday, August 30, 2016 at 9:08 AM > To: Shannon Carey , "user@flink.apache.org" < > user@flink.apache.org> > > Subject: Re: Firing

Re: emit a single Map per window

2016-09-02 Thread Aljoscha Krettek
Hi, from this I would expect to get as many HashMaps as you have keys. The winFunction is also executed per-key so it cannot combine the HashMaps of all keys. Does this describe the behavior that you're seeing? Cheers, Aljoscha On Fri, 2 Sep 2016 at 17:37 Luis Mariano Guerra wrote: > hi! > > I

Re: flink dataStream operate dataSet

2016-09-05 Thread Aljoscha Krettek
Hi, right now it is not possible to mix the DataSet and the DataStream API. The reason for the "task not serializable" error is that putting the DataSet into the map function tries to serialize the DataSet, which is not possible. Cheers, Aljoscha On Tue, 30 Aug 2016 at 16:31 wrote: > Hi, >

Re: Apache Flink: How does it handle the backpressure?

2016-09-05 Thread Aljoscha Krettek
That's true. The reason why it works in Flink is that a slow downstream operator will back pressure an upstream operator which will then slow down. The technical implementation of this relies on the fact that Flink uses a bounded pool of network buffers. A sending operator writes data to network bu

Re: checkpoints not removed on hdfs.

2016-09-05 Thread Aljoscha Krettek
Hi, which version of Flink are you using? Are the checkpoints being reported as successful in the Web Frontend, i.e. in the "checkpoints" tab of the running job? Cheers, Aljoscha On Fri, 2 Sep 2016 at 12:17 Dong-iL, Kim wrote: > Hi, > > I’m using HDFS as state backend. > The checkpoints folder

Re: Remote upload and execute

2016-09-05 Thread Aljoscha Krettek
+Max Michels Directly looping in Max. You recently worked on the clients code, do you have any Idea if and how this is possible? On Fri, 2 Sep 2016 at 14:38 Paul Wilson wrote: > Hi, > > I'd like to write a client that can execute an already 'uploaded' JAR > (i.e. the JAR is deployed and availab

Re: emit a single Map per window

2016-09-05 Thread Aljoscha Krettek
, Aljoscha On Fri, 2 Sep 2016 at 18:26 Luis Mariano Guerra wrote: > On Fri, Sep 2, 2016 at 5:24 PM, Aljoscha Krettek > wrote: > >> Hi, >> from this I would expect to get as many HashMaps as you have keys. The >> winFunction is also executed per-key so it cannot combine

Re: Firing windows multiple times

2016-09-05 Thread Aljoscha Krettek
h can have an impact on the processing guarantees > when a failure/recovery occurs if we don't do it carefully. Also, we're not > particularly sophisticated yet with regard to avoiding unnecessary queries > to the time series data. > > -Shannon > > > From: Aljoscha K

Re: emit a single Map per window

2016-09-06 Thread Aljoscha Krettek
ljoscha On Mon, 5 Sep 2016 at 17:35 Luis Mariano Guerra wrote: > On Mon, Sep 5, 2016 at 12:30 PM, Aljoscha Krettek > wrote: > >> Hi, >> > for this you would have to use a non-parallel window, i.e. something like >> stream.windowAll().apply(...). This does not

Re: Firing windows multiple times

2016-09-09 Thread Aljoscha Krettek
Hi, I'd be very happy to give you pointers for FLIP-2 and FLIP-4. Why don't you start a separate thread on the dev list so that we don't hijack this thread. For FLIP-4 we also have to coordinate with Vishnu, he was driving FLIP-4 but lately everyone has been a bit inactive on that. Let's see if he

Re: Sharing Java Collections within Flink Cluster

2016-09-12 Thread Aljoscha Krettek
Hi, you don't need the BlockedEventState class, you should be able to just do this: private transient ValueState blockedRoads; @Override public void open(final org.apache.flink.configuration.Configuration parameters) throws Exception { final ValueStateDescri

Re: Firing windows multiple times

2016-09-14 Thread Aljoscha Krettek
Hi, yes AJ that observation is correct. Let's see what Shannon has to say about this but it might be that all "higher-level" aggregates will have to be based on the first level and can then update at the speed of that aggregate. Cheers, Aljoscha On Mon, 12 Sep 2016 at 05:03 aj.h wrote: > In the

Re: Fw: Flink Cluster Load Distribution Question

2016-09-14 Thread Aljoscha Krettek
Hi, this is a different job from the Kafka Job that you have running, right? Could you maybe post the code for that as well? Cheers, Aljoscha On Tue, 13 Sep 2016 at 20:14 amir bahmanyari wrote: > Hi Robert, > Sure, I am forwarding it to user. Sorry about that. I followed the > "robot's" instru

Re: Why tuples are not ignored after watermark?

2016-09-14 Thread Aljoscha Krettek
Hi, the problem might be that your timestamp/watermark assigner is run in parallel and that only one parallel instance of those operators emits the watermark because only one of those parallel instances sees the element with _3 == 9000. For the watermark to advance at an operator it needs to advanc

Re: Tumbling window rich functionality

2016-09-14 Thread Aljoscha Krettek
Hi, WindowFunction.apply() will be called once for each window so you should be able to do the setup/teardown in there. open() and close() are called at the start of processing, end of processing, respectively. Cheers, Aljoscha On Wed, 14 Sep 2016 at 09:04 Swapnil Chougule wrote: > Hi Team, > >

Re: Fw: Flink Cluster Load Distribution Question

2016-09-17 Thread Aljoscha Krettek
hile #slots=64 is the same. > > Its still slow for a relatively large file though. > Pls advice if something I can try to improve the cluster performance. > Thanks+regards > > -- > *From:* Aljoscha Krettek > *To:* user@flink.apache.org; amir bahm

Re: Simple batch job hangs if run twice

2016-09-17 Thread Aljoscha Krettek
Hi, when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program? Cheers, Aljoscha On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGU

<    1   2   3   4   5   6   7   8   9   10   >