Arrays values in keyBy

2016-06-10 Thread Elias Levy
I would be useful if the documentation warned what type of equality it expected of values used as keys in keyBy. I just got bit in the ass by converting a field from a string to a byte array. All of the sudden the windows were no longer aggregating. So it seems Flink is not doing a deep compare

Re: Application log on Yarn FlinkCluster

2016-06-10 Thread Robert Metzger
Hi Theofilos, how exactly are you writing the application output? Are you using a logging framework? Are you writing the log statements from the open(), map(), invoke() methods or from some constructors? (I'm asking since different parts are executed on the cluster and locally). On Fri, Jun 10,

Re: Join two streams using a count-based window

2016-06-10 Thread Nikos R. Katsipoulakis
Thank you very much Matthias! Also, the link you provided is very helpful. Cheers, Nikos On Fri, Jun 10, 2016 at 3:16 AM, Matthias J. Sax wrote: > I just put an answer to SO. > > About the other questions: Flink processes tuple-by-tuple and does some > internal buffering. You

Re: Kafka exception "Unable to find a leader for partitions"

2016-06-10 Thread Robert Metzger
Hi Shannon, Some questions: which Flink version are you using? Can you provide me with some more logs, in particular the log entries before this event from the Kafka connector. Also, it is possible that the Kafka broker was in an erroneous state? Did the error happen after weeks of data

Re: Result comparison from 2 DataStream Sources

2016-06-10 Thread Konstantin Knauf
Hi again, and again sorry for the late response. Regarding your first question: You can use a Key Selector Function [1]. Regarding your second question: If I understand your requirement correctly, this is already happening in my gist. By taking the union of both streams the local and away max

Application log on Yarn FlinkCluster

2016-06-10 Thread Theofilos Kakantousis
Hi all, Flink 1.0.3 Hadoop 2.4.0 When running a job on a Flink Cluster on Yarn, the application output is not included in the Yarn log. Instead, it is only printed in the stdout from where I run my program. For the jobmanager, I'm using the log4j.properties file from the flink/conf

Re: Reading whole files (from S3)

2016-06-10 Thread Robert Metzger
Hi, setting the unsplittable attribute in the constructor is fine. The field's value will be send to the cluster. So what happens is that you initialize the input format in your client program. Then, its serialized, send over the network to the machines and deserilaized again. So the value you've

Re: Join two streams using a count-based window

2016-06-10 Thread Matthias J. Sax
I just put an answer to SO. About the other questions: Flink processes tuple-by-tuple and does some internal buffering. You might be interested in https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks -Matthias On 06/09/2016 08:13 PM, Nikos R. Katsipoulakis wrote: >

Re: Reading whole files (from S3)

2016-06-10 Thread Andrea Cisternino
Hi, I am replying to myself for the records and to provide an update on what I am trying to do. I have looked into Mahout's XmlInputFormat class but unfortunately it doesn't solve my problem. My exploratory work with Flink tries to reproduce the key steps that we already perform in a quite

Re: Does Flink allows for encapsulation of transformations?

2016-06-10 Thread Chesnay Schepler
Q1: Whether one of your classes requires the *e**nv* parameter depends on whether you want to create a new Source or set a ExecutionEnvironment parameter inside the class. If you don't you can of course not pass it :) I can't see anything that would prevent it form running on a cluster. Q2:

Re: How to maintain the state of a variable in a map transformation.

2016-06-10 Thread Ravikumar Hawaldar
Hi Fabian, Thank you for your help. I want my Flink application to be distributed as well as I want the facility to support the update of the variable [Coefficients of LinearRegression]. How you would do in that case? The problem with iteration is that it expects Dataset with same type to be

Re: NotSerializableException

2016-06-10 Thread Aljoscha Krettek
Hi, yes, I was talking about a Flink bug. I forgot to mention the work-around that Stephan mentioned. On Thu, 9 Jun 2016 at 20:38 Stephan Ewen wrote: > You can also make the KeySelector a static inner class. That should work > as well. > > On Thu, Jun 9, 2016 at 7:00 PM,