Custom state values in CEP

2016-04-21 Thread Sowmya Vallabhajosyula
Is there a way to store a custom state value along with triggering an event? This would not only help in maintaining an audit of the values that triggered the state, but also for some complex set of conditions - for e.g. if the earlier state was triggered by the temperature of a patient, I do not w

Replays message in Kafka topics with FlinkKafkaConsumer09

2016-04-21 Thread Jack Huang
Hi all, I am trying to force my job to reprocess old messages in my Kafka topics but couldn't get it to work. Here is my FlinkKafkaConsumer09 setup: val kafkaProp = new Properties() kafkaProp.setProperty("bootstrap.servers", "localhost:6667") kafkaProp.setProperty("auto.offset.reset", "earliest")

Re: Checkpoint and restore states

2016-04-21 Thread Jack Huang
@Stefano, Aljoscha: Thank you for pointing that out. With the following steps I verified that the state of the job gets restored 1. Use HDFS as state backend with env.setStateBackend(new FsStateBackend("hdfs:///home/user/flink/KafkaWordCount")) 2. Start the job. In my case the job ID is

Re: Programatic way to get version

2016-04-21 Thread Trevor Grant
dug through the codebase, in case any others want to know: import org.apache.flink.runtime.util.EnvironmentInformation; EnvironmentInformation.getVersion() Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fo

Programatic way to get version

2016-04-21 Thread Trevor Grant
Is there a programatic way to get the Flink version from the scala shell? I.e. something akin to sc.version I thought I used env.version or something like that once but I couldn't find anything in the scala docs. Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com

How to fetch kafka Message have [KEY,VALUE] pair

2016-04-21 Thread prateek arora
Hi I am new for Apache Flink and start using Flink version 1.0.1 In my scenario, kafka message have key value pair [String,Array[Byte]] . I tried to use FlinkKafkaConsumer08 to fetch data but i dont know how to write DeserializationSchema for that. val stream : DataStream[(String,Array[Byte

Re: implementing a continuous time window

2016-04-21 Thread Jonathan Yom-Tov
Thanks. Any pointers on how to do that? Or code examples which do similar things? On Thu, Apr 21, 2016 at 10:30 PM, Fabian Hueske wrote: > Yes, sliding windows are different. > You want to evaluate the window whenever a new element arrives or an > element leaves because 5 secs passed since it en

Flink YARN job manager web port

2016-04-21 Thread Shannon Carey
The documentation states: "The ports Flink is using for its services are the standard ports configured by the user + the application id as an offset" When I launch Flink via YARN in an AWS EMR cluster, stdout says: JobManager Web Interface: http://ip-xxx.us-west-2.compute.internal:20888/proxy/ap

Re: Control triggering on empty window

2016-04-21 Thread Maxim
I think the best way to support such a feature is to extend WindowAssigner with ability to be called on timer and checkpoint its state the same way it is done by the Trigger. Such WindowAssigner would be able to create Windows based on time even if no event is received. On Thu, Apr 21, 2016 at 1:5

Access to a shared resource within a mapper

2016-04-21 Thread Timur Fayruzov
Hello, I'm writing a Scala Flink application. I have a standalone process that exists on every Flink node that I need to call to transform my data. To access this process I need to initialize non thread-safe client first. I would like to avoid initializing a client for each element being transform

Re: implementing a continuous time window

2016-04-21 Thread Fabian Hueske
Yes, sliding windows are different. You want to evaluate the window whenever a new element arrives or an element leaves because 5 secs passed since it entered the window, right? I think that should be possible with a GlobalWindow, a custom Trigger which holds state about the time when each element

Re: implementing a continuous time window

2016-04-21 Thread Jonathan Yom-Tov
I think sliding windows are different. In the example in the blog post a window is computed every 30 seconds (so at fixed time intervals). What I want is for a window to be computed every time an event comes in and then once again when the event leaves the window. On Thu, Apr 21, 2016 at 10:14 PM,

Re: implementing a continuous time window

2016-04-21 Thread John Sherwood
You are looking for sliding windows: https://flink.apache.org/news/2015/12/04/Introducing-windows.html Here you would do .timeWindow(Time.seconds(5), Time.seconds(1)) On Thu, Apr 21, 2016 at 12:06 PM, Jonathan Yom-Tov wrote: > hi, > > Is it possible to implement a continuous time window with f

implementing a continuous time window

2016-04-21 Thread Jonathan Yom-Tov
hi, Is it possible to implement a continuous time window with flink? Here's an example. Say I want to count events within a window. The window length is 5 seconds and I get events at t = 1, 2, 7, 8 seconds. I would then expect to get events with a count at t = 1 (count = 1), t = 2 (count = 2), t =

Re: AvroWriter for Rolling sink

2016-04-21 Thread Igor Berman
ok, I have working prototype already, if somebody is interested(attached) I might add it as PR latter(with tests etc) tested locally & with s3 On 21 April 2016 at 12:01, Aljoscha Krettek wrote: > Hi, > as far as I know there is no one working on this. I'm only aware of > someone working

Re: Count windows missing last elements?

2016-04-21 Thread Kostya Kulagin
Thanks for reply. Maybe I would need some advise in this case. My situation: we have a stream of data, generally speaking tuples where long is a unique key (ie there are no tuples with the same key) I need to filter out all tuples that do not match certain lucene query. Creating lucene index on

Re: Count windows missing last elements?

2016-04-21 Thread Aljoscha Krettek
Hi, if you are doing the windows not for their actual semantics I would suggest not using count based windows and also not using the *All windows. The *All windows are all non-parallel, i.e. you always only get one parallel instance of your window operator even if you have a huge cluster. Also, in

Re: Values are missing, probably due parallelism?

2016-04-21 Thread Kostya Kulagin
Thanks, so you were right and it is really connected to not-finishing windows problem I've mentioned in the other post. I don't really need parallelism of 1 for windows - I expect operation on windows be pretty expensive and I like an idea that I can "parallelize" it. Thanks for the explanation!

Re: Threads waiting on LocalBufferPool

2016-04-21 Thread Aljoscha Krettek
Hi, I would be very happy about improvements to our RocksDB performance. What are the RocksDB Java benchmarks that you are running? In Flink, we also have to serialize/deserialize every time that we access RocksDB using our TypeSerializer. Maybe this is causing the slow down. By the way, what is t

Re: Count windows missing last elements?

2016-04-21 Thread Kostya Kulagin
Maybe if it is not the first time it worth considering adding this thing as an option? ;-) My usecase - I have a pretty big amount of data basically for ETL. It is finite but it is big. I see it more as a stream not as a dataset. Also I would re-use the same code for infinite stream later... And I

Re: Master (1.1-SNAPSHOT) Can't run on YARN

2016-04-21 Thread Maximilian Michels
Hi Stefano, Thanks for reporting. I wasn't able to reproduce the problem. I ran ./bin/yarn-session.sh -n 1 -s 2 -jm 2048 -tm 2048 on a Yarn cluster and it created a Flink cluster with a JobManager and a TaskManager with two task slots. By the way, if you omit the "-s 2" flag, then the default is r

Re: Threads waiting on LocalBufferPool

2016-04-21 Thread Maciek Próchniak
Well... I found some time to look at rocksDB performance. It takes around 0.4ms to lookup value state and 0.12ms to update - these are means, 95th percentile was > 1ms for get... When I set additional options: .setIncreaseParallelism(8) .setMaxOpenFiles(-1) .setCo

RE: lost connection

2016-04-21 Thread Radu Tudoran
Yes - it suddenly occurred on something that used to work. I am restarting the deployment to see if this solves the problem Dr. Radu Tudoran Research Engineer - Big Data Expert IT R&D Division [cid:image007.jpg@01CD52EB.AD060EE0] HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center Ries

Re: lost connection

2016-04-21 Thread Chesnay Schepler
That is an exempt from the client log, can you check the JobManager log? It could have crashed, and if so the cause is hopefully in there. Did this issue suddenly occur; as in have you run a job successfully on the system before? (to exclude network configuration issues) Regards, Chesnay On

RE: lost connection

2016-04-21 Thread Radu Tudoran
- Could not submit job Operator2 execution (170aef70d31f3fee62f8a483930be213), because there is no connection to a JobManager. 15:59:48,456 WARN Remoting - Tried to associate with unreachable remote address [akka.tcp://flink@10.204.62.71:6123

Re: lost connection

2016-04-21 Thread Chesnay Schepler
Hello, the first step is always to check the logs under /log. The JobManager log in particular may contain clues as why no connection could be established. Regards, Chesnay On 21.04.2016 15:44, Radu Tudoran wrote: Hi, I am trying to submit a jar via the console (flink run my.jar). The re

Re: Explanation on limitations of the Flink Table API

2016-04-21 Thread Simone Robutti
Thanks for all your input. The design document covers the use cases we have in mind and querying external sources may be interesting to us for other uses not mentioned in the first mail. I will wait for developments in this direction, because the expected result seems promising. :) Thank you aga

lost connection

2016-04-21 Thread Radu Tudoran
Hi, I am trying to submit a jar via the console (flink run my.jar). The result is that I get an error saying that the communication with the jobmanager failed: Lost connection to the jobmanager. Can you give me some hints/ recommendations about approaching this issue. Thanks Dr. Radu Tudoran R

Re: Processing millions of messages in milliseconds real time -- Architecture guide required

2016-04-21 Thread Ken Krugler
This seems pretty similar to what you’re asking about: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ Especially the part where they “...directly exposed the in-flight windows to be queried”, as that sounds like what you meant by “The data cache should have the capability to

Re: logback.xml and logback-yarn.xml rollingpolicy configuration

2016-04-21 Thread Balaji Rajagopalan
Thanks Till setting in log4j.properties worked. On Tue, Apr 19, 2016 at 8:04 PM, Till Rohrmann wrote: > Have you made sure that Flink is using logback [1]? > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#using-logback-instead-of-log4j > > Cheers, > Till

Re: Help with generics

2016-04-21 Thread Aljoscha Krettek
Hi, I'm sorry, I meant TypeInformation.of(initModel.getClass()). On Thu, 21 Apr 2016 at 15:17 Martin Neumann wrote: > Hej, > > I already tried TypeInformation.of(initModel.class) and it complained > that initModel class is unknown. (Since it's of type M) > I added a function to the model.class t

Re: Help with generics

2016-04-21 Thread Martin Neumann
Hej, I already tried TypeInformation.of(initModel.class) and it complained that initModel class is unknown. (Since it's of type M) I added a function to the model.class that returns the TypeInformation its working now though I still don't understand what happend behind the scenes and what I change

Re: Explanation on limitations of the Flink Table API

2016-04-21 Thread Flavio Pompermaier
We're also trying to work around the current limitations of Table API and we're reading DataSets with on-purpose input formats that returns a POJO Row containing the list of values (but we're reading all values as String...). Actually we would also need a way to abstract the composition of Flink op

Re: Count windows missing last elements?

2016-04-21 Thread Aljoscha Krettek
People have wondered about that a few times, yes. My opinion is that a stream is potentially infinite and processing only stops for anomalous reasons: when the job crashes, when stopping a job to later redeploy it. In those cases you would not want to flush out your data but keep them and restart f

Re: Explanation on limitations of the Flink Table API

2016-04-21 Thread Fabian Hueske
Hi Simone, in Flink 1.0.x, the Table API does not support reading external data, i.e., it is not possible to read a CSV file directly from the Table API. Tables can only be created from DataSet or DataStream which means that the data is already converted into "Flink types". However, the Table API

Re: Count windows missing last elements?

2016-04-21 Thread Kostya Kulagin
Thanks, I wonder wouldn't it be good to have a built-in such functionality. At least when incoming stream is finished - flush remaining elements. On Thu, Apr 21, 2016 at 4:47 AM, Aljoscha Krettek wrote: > Hi, > yes, you can achieve this by writing a custom Trigger that can trigger > both on the

Re: Help with generics

2016-04-21 Thread Aljoscha Krettek
Hi, you're right there is not much (very little) in the documentation about TypeInformation. There is only the description in the JavaDoc: TypeInformation Essentially, how it

Explanation on limitations of the Flink Table API

2016-04-21 Thread Simone Robutti
Hello, I would like to know if it's possible to create a Flink Table from an arbitrary CSV (or any other form of tabular data) without doing type safe parsing with expliciteky type classes/POJOs. To my knowledge this is not possible but I would like to know if I'm missing something. My requiremen

Re: Help with generics

2016-04-21 Thread Martin Neumann
Hej, I pass an instance of M in the constructor of the class, can I use that instead? Maybe give the class a function that returns the right TypeInformation? I'm trying figure out how TypeInformation works to better understand the Issue is there any documentation about this? At the moment I don't

Re: Help with generics

2016-04-21 Thread Aljoscha Krettek
Hi, I think it doesn't work because the concrete type of M is not available to create a TypeInformation for M. What you can do is manually pass a TypeInformation or a TypeSerializer to the AnomalyFlatMap and use that when creating the state descriptor. Cheers, Aljoscha On Thu, 21 Apr 2016 at 13:4

Re: Values are missing, probably due parallelism?

2016-04-21 Thread Aljoscha Krettek
Hi, no worries, I also had to read the doc to figure it out. :-) I now see what the problem is. The .countWindowAll().apply() pattern creates a WindowOperator with parallelism of 1 because the "count all" only works if one instance of the window operator sees all elements. When manually changing t

Re: Values are missing, probably due parallelism?

2016-04-21 Thread Kostya Kulagin
First of all you are right about number of elements, my bad and sorry for the confusion, I need to be better in calculations :) However: if I change parallelism to. lets say 2 in windowing, i.e. instead of (of course I changed 29 to 30 as well :) ) }).print(); put }).setParallelism(2).print();

Help with generics

2016-04-21 Thread Martin Neumann
Hey, I have a FlatMap that uses some generics (appended at the end of the mail). I have some trouble with the type inference running into InvalidTypesException on the first line in the open function. How can I fix it? Cheers Martin public class AnomalyFlatMap extends RichFlatMapFunction, Tup

Re[2]: Trying to detecting changes

2016-04-21 Thread toletum
Thanks Till, At the end, I'm going to use a countWindowAll(2,1) and RichAllWindowFunction. Regards, On mié., abr. 20, 2016 at 16:46, Till Rohrmann wrote: You could use CEP for that. First you would create a pattern of two states which matches everything. In the select function you could then c

Re: Values are missing, probably due parallelism?

2016-04-21 Thread Aljoscha Krettek
Hi, which version of Flink are you using? Maybe there is a bug. I've tested it on the git master (1.1-SNAPSHOT) and it works fine with varying degrees of parallelism if I change the source to emit 30 elements: LongStream.range(0, 30).forEach(ctx::collect); (The second argument of LongStream.range(

Re: Operation of Windows and Triggers

2016-04-21 Thread Piyush Shrivastava
Hi all,Thanks a lot for your valuable suggestions. I am trying to implement the logic of creating a custom trigger with a GlobalWindow which fires the window in one minute for the first time and every five seconds after that, if this logic works I will change it to one hour and five minutes resp

Re: Values are missing, probably due parallelism?

2016-04-21 Thread Kostya Kulagin
Actually this is not true - the source emits 30 values since it is started with 0. If I change 29 to 33 result will be the same. I can get all values if I play with parallelism. I.e putting parallel 1 before print. Or if I change 29 to 39 ( I have 4 cors) I can guess that there is smth wrong with t

Re: save state in windows operation

2016-04-21 Thread Rubén Casado
Thanks for your help!! That is exactly what we need :-) __ Dr. Rubén Casado Head of Big Data Treelogic ruben.casado.treelogic +34 902 286 386 - +34 607 18 28 06 Parque Tecnológico de Asturias · Parcela 30 E33428 Llanera · Asturias

Re: save state in windows operation

2016-04-21 Thread Aljoscha Krettek
Hi, you should be able to do this using Flink's state abstraction in a RichWindowFunction like this: public static class MyApplyFunction extends RichWindowFunction, Tuple2, Tuple, GlobalWindow> { ValueStateDescriptor> stateDescriptor = new ValueStateDescriptor<>("last-result",

save state in windows operation

2016-04-21 Thread Rubén Casado
Hello, We have problems working with states in Flink and I am sure you can help us :-) Let's say we have a workflow something like: DataStream myData = env.from... myData.map(new MyMap (..)) .keyBy(0) .countWindow(n) .apply(new MyApplyFunction()) .writeAsCSV(...) To implement the

Re: AvroWriter for Rolling sink

2016-04-21 Thread Aljoscha Krettek
Hi, as far as I know there is no one working on this. I'm only aware of someone working on an ORC (from Hive) Writer. This would be a welcome addition! I think you are already on the right track, the only thing required will probably be an AvroFileWriter and you already started looking at Sequence

Re: Control triggering on empty window

2016-04-21 Thread Aljoscha Krettek
Hi, I'm afraid this is not possible with our windowing model (expect with hacks using GlobalWindow, as you mentioned). The reason is, that windows only come into existence once there is an element that has a window. Before that, the system has no reference point about what windows there should exis

Re: Values are missing, probably due parallelism?

2016-04-21 Thread Aljoscha Krettek
Hi, this is related to your other question about count windows. The source emits 29 values so we only have two count-windows with 10 elements each. The last window is never triggered. Cheers, Aljoscha On Wed, 20 Apr 2016 at 23:52 Kostya Kulagin wrote: > I think it has smth to do with parallelis

Re: Count windows missing last elements?

2016-04-21 Thread Aljoscha Krettek
Hi, yes, you can achieve this by writing a custom Trigger that can trigger both on the count or after a long-enough timeout. It would be a combination of CountTrigger and EventTimeTrigger (or ProcessingTimeTrigger) so you could look to those to get started. Cheers, Aljoscha On Wed, 20 Apr 2016 at

Re: Processing millions of messages in milliseconds real time -- Architecture guide required

2016-04-21 Thread Sendoh
Maybe you can refer to this- Kafka + Flink http://data-artisans.com/kafka-flink-a-practical-how-to/ -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Processing-millions-of-messages-in-milliseconds-real-time-Architecture-guide-required-tp6191p6

Re: Checkpoint and restore states

2016-04-21 Thread Aljoscha Krettek
Hi, yes Stefano is spot on! The state is only restored if a job is restarted because of abnormal failure. For state that survives stopping/canceling a job you can look at savepoints: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html This essentially uses the same

AvroWriter for Rolling sink

2016-04-21 Thread Igor Berman
Hi All, Is there such implementation somewhere?(before I start to implement it myself, it seems not too difficult based on SequenceFileWriter example) anyway any ideas/pointers will be highly appreciated thanks in advance