Re: Any good ideas for online/offline detection of devices that send events?

2017-03-06 Thread Tzu-Li (Gordon) Tai
Some more input: Right now, you can also use the `ProcessFunction` [1] available in Flink 1.2 to simulate state TTL. The `ProcessFunction` should allow you to keep device state and simulate the online / offline detection by registering processing timers. In the `onTimer` callback, you can emit

Re: How to use 'dynamic' state

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi Steve, I’ll try to provide some input for the approaches you’re currently looking into (I’ll comment on your email below): * API based stop and restart of job … ugly.  Yes, indeed ;) I think this is absolutely not necessary. * Use a co-map function with the rules alone stream and the events

Re: AWS exception serialization problem

2017-03-06 Thread Shannon Carey
This happened when running Flink with bin/run-local.sh I notice that there only appears to be one Java process. The job manager and the task manager run in the same JVM, right? I notice, however, that there are two blob store folders on disk. Could the problem be caused by two different

AWS exception serialization problem

2017-03-06 Thread Shannon Carey
Has anyone encountered this or know what might be causing it? java.lang.RuntimeException: Could not forward element to next operator at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:394) at

How to use 'dynamic' state

2017-03-06 Thread Steve Jerman
I’ve been reading the code/user goup/SO and haven’t really found a great answer to this… so I thought I’d ask. I have a UI that allows the user to edit rules which include specific criteria for example trigger event if X many people present for over a minute. I would like to have a flink job

Issues with Event Time and Kafka

2017-03-06 Thread ext.mwalker
Hi Folks, We are working on a Flink job to proccess a large amount of data coming in from a Kafka stream. We selected Flink because the data is sometimes out of order or late, and we need to roll up the data into 30-minutes event time windows, after which we are writing it back out to an s3

FlinkKafkaConsumer010 - creating a data stream of type DataStream<ConsumerRecord<K,V>>

2017-03-06 Thread Dominik Safaric
Hi, Unfortunately I cannot find the option of using raw ConsumerRecord instances when creating a Kafka data stream. In general, I would like to use an instance of the mentioned type because our use case requires certain metadata such as record offset and partition. So far I’ve examined

Re: OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

2017-03-06 Thread Nico Kruber
Hi Yassine, Thanks for reporting this. The problem you run into is due to start-local.sh which we discourage in favour of start-cluster.sh that resembles real use case better. In your case, start-local.sh starts a job manager with an embedded task manager but does not parse the task manager

Re: Amazon EMR Releases

2017-03-06 Thread Meghashyam Sandeep V
I spoke to one of the representatives in AWS EMR team last week. They mentioned that they usually practice a 4 week cool down period. Hopefully we will get Flink 1.2 in the next week. Thanks, Sandeep On Mar 6, 2017 9:27 AM, "Chen Qin" wrote: > EMR is a team within Amazon

Amazon EMR Releases

2017-03-06 Thread Torok, David
Does anyone have any insight as to how closely Amazon EMR releases will track the Flink releases? For example EMR 5.3.1 on Feb 7 included Flink 1.1.4, from Dec 21 ... about 1.5 month lag. With Flink accelerating to a timed release schedule, I wonder how far behind EMR will track the official

TTL for State Entries / FLINK-3089

2017-03-06 Thread Johannes Schulte
Hi, I am trying to achieve a stream-to-stream join with big windows and are searching for a way to clean up state of old keys. I am already using a RichCoProcessFunction I found there is already an existing ticket https://issues.apache.org/jira/browse/FLINK-3089 but I have doubts that a

Re: AsyncIO/QueryableStateClient hanging with high parallelism

2017-03-06 Thread Yassine MARZOUGUI
I think I found the reason for what happened. The way I used the QueryableStateClient is that I wrapped scala.concurrent.Future in a FlinkFuture and then called FlinkFuture.thenAccept. It turns out thenAccept doesn't throw exceptions and when an exception happens (which likely happened once I

Re: Integrate Flink with S3 on EMR cluster

2017-03-06 Thread vinay patil
Hi Guys, I am getting the same exception: EMRFileSystem not Found I am trying to read encrypted S3 file using Hadoop File System class. (using Flink 1.2.0) When I copy all the libs from /usr/share/aws/emrfs/lib and /usr/lib/hadoop to Flink lib folder , it works. However I see that all these

AsyncIO/QueryableStateClient hanging with high parallelism

2017-03-06 Thread Yassine MARZOUGUI
Hi all, I set up a job with simple queryable state sink and tried to query it from another job using the new Async I/O API. Everything worked as expected, except when I tried to increase the parallelism of the querying job it hanged. As you can see in the attached image, when the parallism is 5

Re: Flink using notebooks in EMR

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi, Are you running Zeppelin on a local machine? I haven’t tried this before, but you could first try and check if port ‘6123’ is publicly accessible in the security group settings of the AWS EMR instances. - Gordon On March 3, 2017 at 10:21:41 AM, Meghashyam Sandeep V

Re: Memory Limits: MiniCluster vs. Local Mode

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi Dominik, AFAIK, the local mode executions create a mini cluster within the JVM to run the job. Also, `MiniCluster` seems to be something FLIP-6 related, and since FLIP-6 is still work in progress, I’m not entirely sure if it is viable at the moment. Right now, you should look into using

Re: Any good ideas for online/offline detection of devices that send events?

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi Bruno! The Flink CEP library also seems like an option you can look into to see if it can easily realize what you have in mind. Basically, the pattern you are detecting is a timeout of 5 minutes after the last event. Once that pattern is detected, you emit a “device offline” event

Re: Flink 1.2 and Cassandra Connector

2017-03-06 Thread Chesnay Schepler
Hello, i believe the cassandra connector is not shading it's dependencies properly. This didn't cause issues in the past since flink used to have a dependency on codahale metrics as well. Please open a JIRA for this issue. Regards, Chesnay On 06.03.2017 11:32, Tarandeep Singh wrote: Hi

Re: Flink 1.2 and Cassandra Connector

2017-03-06 Thread Tarandeep Singh
Hi Robert & Nico, I am facing the same problem (java.lang.NoClassDefFoundError: com/codahale/metrics/Metric) Can you help me identify shading issue in pom.xml file. My pom.xml content- - http://maven.apache.org/POM/4.0.0;