Dedicated ExecutionContext inside Flink

2016-06-07 Thread Soumya Simanta
What is the recommended practice for using a dedicated ExecutionContexts inside Flink code? We are making some external network calls using Futures. Currently all of them are running on the global execution context (import scala.concurrent.ExecutionContext.Implicits.global). Thanks -Soumya

Uploaded jar disappears when web monitor restarts

2016-06-07 Thread Emanuele Cesena
Hi, When the web monitor restarts the uploaded jars disappear — in fact, every time it restarts the upload directory is different. This seems intentional: https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/WebRuntimeMonitor.java#L162

Re: Reading whole files (from S3)

2016-06-07 Thread Suneel Marthi
You can use Mahout XMLInputFormat with Flink - HAdoopInputFormat definitions. See http://stackoverflow.com/questions/29429428/xmlinputformat-for-apache-flink http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-td7023.html On Tue, Jun 7, 2016 at 10:11 PM, Jamie

Re: Reading whole files (from S3)

2016-06-07 Thread Jamie Grier
Hi Andrea, How large are these data files? The implementation you've mentioned here is only usable if they are very small. If so, you're fine. If not read on... Processing XML input files in parallel is tricky. It's not a great format for this type of processing as you've seen. They are

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Ser Kho
Chesnay:Just want to thank you. I might have one or two related questions later on, but now just thanks. On Tuesday, June 7, 2016 8:18 AM, Greg Hogan wrote: "The question is how to encapsulate numerous transformations into one object or may be a function in

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Jamie Grier
I'm assuming what you're trying to do is essentially sum over two different fields of your data. I would do this with my own ReduceFunction. stream .keyBy("someKey") .reduce(CustomReduceFunction) // sum whatever fields you want and return the result I think it does make sense that Flink

Re: Submit Flink Jobs to YARN running on AWS

2016-06-07 Thread Ashutosh Kumar
If you use open vpn for accessing aws then you can use private IP of ec2 machine from your laptop. Thanks Ashutosh On Tue, Jun 7, 2016 at 11:00 PM, Shannon Carey wrote: > We're also starting to look at automating job deployment/start to Flink > running on EMR. There are a

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Soumya Simanta
Thanks for the clarification. On Tue, Jun 7, 2016 at 9:15 PM, Aljoscha Krettek wrote: > Hi, > I'm afraid you're running into a bug into the special processing-time > window operator. A suggested workaround would be to switch to > characteristic IngestionTime and use

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Aljoscha Krettek
Hi, I'm afraid you're running into a bug into the special processing-time window operator. A suggested workaround would be to switch to characteristic IngestionTime and use TumblingEventTimeWindows. I also open a Jira issue for the bug so that we can keep track of it:

Re: Kafka producer sink message loss?

2016-06-07 Thread Elias Levy
On Tue, Jun 7, 2016 at 4:52 AM, Stephan Ewen wrote: > The concern you raised about the sink being synchronous is exactly what my > last suggestion should address: > > The internal state backends can return a handle that can do the sync in a > background thread. The sink would

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Gábor Gévay
Ah, sorry, you are right. You could also call keyBy again before the second sum, but maybe someone else has a better idea. Best, Gábor 2016-06-07 16:18 GMT+02:00 Al-Isawi Rami : > Thanks Gábor, but the first sum call will return > > SingleOutputStreamOperator > > I

Reading whole files (from S3)

2016-06-07 Thread Andrea Cisternino
Hi all, I am evaluating Apache Flink for processing large sets of Geospatial data. The use case I am working on will involve reading a certain number of GPX files stored on Amazon S3. GPX files are actually XML files and therefore cannot be read on a line by line basis. One GPX file will produce

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Al-Isawi Rami
Thanks Gábor, but the first sum call will return SingleOutputStreamOperator I could not do another sum call on that. Would tell me how did you manage to do stream.sum().sum() Regards, -Rami On 7 Jun 2016, at 16:13, Gábor Gévay > wrote: Hello, In

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Flavio Pompermaier
After a second look to KryoSerializer I fear that Input and Output are never closed..am I right? On Tue, Jun 7, 2016 at 3:06 PM, Flavio Pompermaier wrote: > Hi Aljoscha, > of course I can :) > Thanks for helping me..do you think it is the right thing to do calling >

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Gábor Gévay
Hello, In the case of "sum", you can just specify them one after the other, like: stream.sum(1).sum(2) This works, because summing the two fields are independent. However, in the case of "keyBy", the information is needed from both fields at the same time to produce the key. Best, Gábor

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Flavio Pompermaier
Hi Aljoscha, of course I can :) Thanks for helping me..do you think it is the right thing to do calling reset()? Actually, I don't know whether this is meaningful or not, but I already ran the job successfully once on the cluster (a second attempt is curerntly running) after my accidental

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Soumya Simanta
The problem is why is the window end time in the future ? For example if my window size is 60 seconds and my window is being evaluated at 3.00 pm then why is the window end time 3.01 pm and not 3.00 pm even when the data that is being evaluated falls in the window 2.59 - 3.00. Sent from my

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Aljoscha Krettek
That's nice. Can you try it on your cluster with an added "reset" call on the buffer? On Tue, 7 Jun 2016 at 14:35 Flavio Pompermaier wrote: > After "some" digging into this problem I'm quite convinced that the > problem is caused by a missing reset of the buffer during the

Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Al-Isawi Rami
Hi, Is there any reason why “keyBy" accepts multi-field, while for example “sum” does not. -Rami Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Flavio Pompermaier
After "some" digging into this problem I'm quite convinced that the problem is caused by a missing reset of the buffer during the Kryo deserialization, likewise to what has been fixed by FLINK-2800 ( https://github.com/apache/flink/pull/1308/files). That fix added an output.clear() in

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Greg Hogan
"The question is how to encapsulate numerous transformations into one object or may be a function in Apache Flink Java setting." Implement CustomUnaryOperation. This can then be applied to a DataSet by calling `DataSet result = DataSet.runOperation(new MyOperation<>(...));`. On Mon, Jun 6, 2016

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Chesnay Schepler
1a. ah. yeah i see how it could work, but i wouldn't count on it in a cluster. you would (most likely) run the the sub-job (calculating pi) only on a single node. 1b. different execution environments generally imply different flink programs. 2. sure it does, since it's a normal flink job.

Re: Kafka producer sink message loss?

2016-06-07 Thread Stephan Ewen
Hi Elias! The concern you raised about the sink being synchronous is exactly what my last suggestion should address: The internal state backends can return a handle that can do the sync in a background thread. The sink would continue processing messages, and the checkpoint would only be

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Ser Kho
Chesnay:  1a. The code actually works, that is the point. 1b. What restrict for a Flink program to have several execution environments?2. I am not sure that your  modification allows for parallelism. Does it?3. This code is a simple example of writing/organizing large and complicated programs,

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Chesnay Schepler
could you state a specific problem? On 07.06.2016 06:40, Soumya Simanta wrote: I've a simple program which takes some inputs from a command line (Socket stream) and then aggregates based on the key. When running this program on my local machine I see some output that is counter intuitive to

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Chesnay Schepler
from what i can tell from your code you are trying to execute a job within a job. This just doesn't work. your main method should look like this: |publicstaticvoidmain(String[]args)throwsException{doublepi =new classPI().compute();System.out.println("We estimate Pi to be: "+pi);}| On

Re: Logs show `Marking the coordinator 2147483637 dead` in Flink-Kafka conn

2016-06-07 Thread Sendoh
Hi Robert, Thank you for checking the issue. That INFO is the only information Flink workers say. I agree your point of view. Looks like it closes the connections to all other topics which is not used(idle) although it's a bit misleading. Ref: https://github.com/edenhill/librdkafka/issues/437

Re: Periodically evicting internal states when using mapWithState()

2016-06-07 Thread Aljoscha Krettek
Hi Jack, right now this is not possible except when writing a custom operator. We are working on support for a time-to-live setting on states, this should solve your problem. For writing a custom operator, check out DataStream.transform() and StreamMap, which is the operator implementation for

Re: Custom keyBy(), look for similaties

2016-06-07 Thread iñaki williams
Thanks for your answer Ufuk. However, I have been reading about KeySelector and I don't understand completely how it works with my idea. I am using an algorithm that gives me an score between some different strings. My idea is: if the score is higher than 0'80 for example, then those two strings