.keyBy() on ConnectedStream

2017-01-26 Thread Matt
Hi all, What's the purpose of .keyBy() on ConnectedStream? How does it affect .map() and .flatMap()? I'm not finding a way to group stream elements based on a key, something like a Window on a normal Stream, but for a ConnectedStream. Regards, Matt

Re: Flink snapshotting to S3 - Timeout waiting for connection from pool

2017-01-26 Thread Shannon Carey
Haha, I see. Thanks. On 1/26/17, 1:48 PM, "Chen Qin" wrote: >We worked around S3 and had a beer with our Hadoop engineers... > > > >-- >View this message in context:

Proper ways to write iterative DataSets with dependencies

2017-01-26 Thread Li Peng
Hi there, I just started investigating Flink and I'm curious if I'm approaching my issue in the right way. My current usecase is modeling a series of transformations, where I start with some transformations, which when done can yield another transformation, or a result to output to some sink, or

Re: Issues while restarting a job on HA cluster

2017-01-26 Thread ani.desh1512
Hi Robert, Thanks for the answer. My code does actually contain both mapr streams and maprdb jars. here are the steps I followed based on your suggestion: 1. I copied only the mapr-streams-*.jar and maprdb*.jar. 2. Then I tried to run my jar, but i got java.lang.noclassdeffounderror for some

Dummy DataStream

2017-01-26 Thread Duck
I have a project where i am reading in on a single DataStream from Kafka, then sending to a variable number of handlers based on content of the recieved data, after that i want to join them all. Since i do not know how many different streams this will create, i cannot have a single "base" to

Re: Flink snapshotting to S3 - Timeout waiting for connection from pool

2017-01-26 Thread Chen Qin
We worked around S3 and had a beer with our Hadoop engineers... -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-snapshotting-to-S3-Timeout-waiting-for-connection-from-pool-tp10994p11330.html Sent from the Apache Flink User Mailing List

Re: Many operations cause StackOverflowError with AWS EMR YARN cluster

2017-01-26 Thread Geoffrey Mon
Hello Chesnay, Thanks for the advice. I've begun adding multiple jobs per Python plan file here: https://issues.apache.org/jira/browse/FLINK-5183 and https://github.com/GEOFBOT/flink/tree/FLINK-5183 The functionality of the patch works. I am able to run multiple jobs per file successfully, but

Re: Events are assigned to wrong window

2017-01-26 Thread Nico
Hi, can anyone help me with this problem? I don't get it. Forget the examples below, I've created a copy / paste example to reproduce the problem of incorrect results when using key-value state und windowOperator. public class StreamingJob { public static void main(String[] args) throws

start-cluster.sh issue

2017-01-26 Thread Lior Amar
Hi, I am new here. My name is Lior and I am working at Parallel Machines. Was assigned recently to work on Flink run/use/improve :-) I am using the FLINK_CONF_DIR environment variable to pass the config location to the start-cluster.sh (e.g. env

Re: User configuration

2017-01-26 Thread Dmitry Golubets
Yes, thank you Robert! Best regards, Dmitry On Thu, Jan 26, 2017 at 4:55 PM, Robert Metzger wrote: > Hi, > Is this what you are looking for? https://ci.apache.org/ > projects/flink/flink-docs-release-1.2/monitoring/best_ > practices.html#parsing-command-line-arguments-and-

Re: User configuration

2017-01-26 Thread Robert Metzger
Hi, Is this what you are looking for? https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application On Thu, Jan 26, 2017 at 5:38 PM, Dmitry Golubets wrote: > Hi, >

User configuration

2017-01-26 Thread Dmitry Golubets
Hi, Is there a place for user defined configuration settings? How to read them? Best regards, Dmitry

Re: Flink dependencies shading

2017-01-26 Thread Dmitry Golubets
Hi Robert, I ended up overriding Flink httpclient version number in main pom file and recompiling it. Thanks Best regards, Dmitry On Thu, Jan 26, 2017 at 4:12 PM, Robert Metzger wrote: > Hi Dmitry, > > I think this issue is new. > Where is the AWS SDK dependency coming

Re: Flink dependencies shading

2017-01-26 Thread Robert Metzger
Hi Dmitry, I think this issue is new. Where is the AWS SDK dependency coming from? Maybe you can resolve the issue on your side for now. I've filed a JIRA for this issue: https://issues.apache.org/jira/browse/FLINK-5661 On Wed, Jan 25, 2017 at 8:24 PM, Dmitry Golubets

Re: Debugging, logging and measuring operator subtask performance

2017-01-26 Thread Robert Metzger
Hi Dominik, You could measure the throughput at each task in your job to see if one operator is causing the slowdown (for example using Flink's metrics system) Maybe the backpressure view already helps finding the task that causes the issue. Did you check if there are enough resources available

Re: Issues while restarting a job on HA cluster

2017-01-26 Thread Robert Metzger
Hi Ani, This error is independent of cancel vs stop. Its an issue of loading the MapR classes from the classloaders. Do you user jars contain any MapR code (either mapr streams or maprdb)? If so, I would recommend you to put these MapR libraries into the "lib/" folder of Flink. They'll then be

Re: Kafka data not read in FlinkKafkaConsumer09 second time from command line

2017-01-26 Thread Robert Metzger
Hi, I would guess that the watermark generation does not work as expected. I would recommend to log the extracted timestamps + the watermarks to understand how time is progressing, and when watermarks are generated to trigger a window computation. On Tue, Jan 24, 2017 at 6:53 PM, Sujit Sakre

Re: State Descriptors / Queryable State Question

2017-01-26 Thread Fabian Hueske
Hi Joe, working on a KeyedStream means that the records are partitioned by that key, i.e., all records with the same key are processed by the same thread. Therefore, only on thread accesses the state for a particular key. Other tasks do not have read or write access to the state of other tasks.

Re: CEP and KeyedStreams doubt

2017-01-26 Thread Kostas Kloudas
Hi Oriol, The number of keys is related to the number of data-structures (NFAs) Flink is going to create and keep. Given this, it may make sense to try to reduce your key-space (or your keyedStreams). Other than that, Flink has not issue handling large numbers of keys. Now, for the issue you

Re: Rate-limit processing

2017-01-26 Thread Robert Metzger
Hi Florian, you can rate-limit the Kafka consumer by implementing a custom DeserializationSchema that sleeps a bit from time to time (or at each deserialization step) On Tue, Jan 24, 2017 at 1:16 PM, Florian König wrote: > Hi Till, > > thank you for the very helpful

Re: Improving Flink Performance

2017-01-26 Thread Stephan Ewen
@jonas Flink's Fork-Join Pool drives only the actors, which are doing coordination. Unless your job is permanently failing/recovering, they don't do much. On Thu, Jan 26, 2017 at 2:56 PM, Robert Metzger wrote: > Hi Jonas, > > The good news is that your job is completely

Re: Improving Flink Performance

2017-01-26 Thread Robert Metzger
Hi Jonas, The good news is that your job is completely parallelizable. So if you are running it on a cluster, you can scale it at least to the number of Kafka partitions you have (actually even further, because the Kafka consumers are not the issue). I don't think that the scala (=akka) worker

State Descriptors / Queryable State Question

2017-01-26 Thread Joe Olson
If I have a keyed stream going in to a N node Flink stream processor, and I had a job that was keeping a count using a ValueStateDescriptor (per key), would that descriptor be synchronized among all the nodes? i.e. Are the state descriptors interfaces (ValueStateDescriptor,

Re: Improving Flink Performance

2017-01-26 Thread Jonas
JProfiler -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11311.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

CEP and KeyedStreams doubt

2017-01-26 Thread Oriol
Hello everyone, I'm using the CEP library for event stream processing. I'm splitting the dataStream into different KeyedStreams using keyBy(). In the KeyBy, I'm using a tuple of two elements, which means I may have several millions of KeyedStreams, as I need to monitor all our customer's users.

Re: Improving Flink Performance

2017-01-26 Thread dromitlabs
Offtopic: What profiler is it that you're using? > On Jan 25, 2017, at 18:11, Jonas wrote: > > Images: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png > and >