Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Maximilian Michels
I mentioned that the exception gets thrown when requesting container status information. We need this to send a heartbeat to YARN but it is not very crucial if this fails once for the running job. Possibly, we could work around this problem by retrying N times in case of an exception. Would it be

Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Maximilian Michels
Hi Niels, You mentioned you have the option to update Hadoop and redeploy the job. Would be great if you could do that and let us know how it turns out. Cheers, Max On Wed, Dec 2, 2015 at 3:45 PM, Niels Basjes wrote: > Hi, > > I posted the entire log from the first log line at

Re: Iterative queries on Flink

2015-12-02 Thread Flavio Pompermaier
Do you think it is possible to push ahead this thing? I need to implement this interactive feature of Datasets. Do you think it is possible to implement the persist() method in Flink (similar to Spark)? If you want I can work on it with some instructions.. On Wed, Dec 2, 2015 at 3:05 PM,

Way to get accumulators values *during* job execution ?

2015-12-02 Thread LINZ, Arnaud
Hello, I use Grafana/Graphite to monitor my applications. The Flink GUI is really nice, but it disappears after the job completes and consequently is not suitable to long-term monitoring. For batch applications, I simply send the accumulator’s values at the end of the job to my Graphite base.

Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Niels Basjes
Hi, I posted the entire log from the first log line at the moment of failure to the very end of the logfile. This is all I have. As far as I understand the Kerberos and Keytab mechanism in Hadoop Yarn is that it catches the "Invalid Token" and then (if keytab) gets a new Kerberos ticket (or

Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Maximilian Michels
Great. Here is the commit to try out: https://github.com/mxm/flink/commit/f49b9635bec703541f19cb8c615f302a07ea88b3 If you already have the Flink repository, check it out using git fetch https://github.com/mxm/flink/ f49b9635bec703541f19cb8c615f302a07ea88b3 && git checkout FETCH_HEAD

Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Niels Basjes
Sure, just give me the git repo url to build and I'll give it a try. Niels On Wed, Dec 2, 2015 at 4:28 PM, Maximilian Michels wrote: > I mentioned that the exception gets thrown when requesting container > status information. We need this to send a heartbeat to YARN but it is

Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Maximilian Michels
I forgot you're using Flink 0.10.1. The above was for the master. So here's the commit for Flink 0.10.1: https://github.com/mxm/flink/commit/a41f3866f4097586a7b2262093088861b62930cd git fetch https://github.com/mxm/flink/ \ a41f3866f4097586a7b2262093088861b62930cd && git checkout FETCH_HEAD

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Stephan Ewen
Hi Mihail! Do I understand you correctly that the use case is to raise an alarm if an order has not been processed within a certain time period (certain number of days) ? If that is the case, the use case is actually perfect for a special form of session windows that monitor such timeouts. I

Re: Way to get accumulators values *during* job execution ?

2015-12-02 Thread Stephan Ewen
Hi Arnaud! One thing you can do is to periodically retrieve them by querying the monitoring API: https://ci.apache.org/projects/flink/flink-docs-release-0.10/internals/monitoring_rest_api.html A nice approach would be to let the JobManager eagerly publish the metrics. I think that Christian

unsubscribe

2015-12-02 Thread 范昂
发自我的 iPhone > 在 2015年12月3日,上午1:41,Maximilian Michels 写道: > > Great. Here is the commit to try out: > https://github.com/mxm/flink/commit/f49b9635bec703541f19cb8c615f302a07ea88b3 > > If you already have the Flink repository, check it out using > > git fetch

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Vieru, Mihail
Hi Gyula, Hi Stephan, thank you for your replies. We need a state which grows indefinitely for the following use case. An event is created when a customer places an order. Another event is created when the order is sent. These events typically occur within days. We need to catch the cases when

Re: Including option for starting job and task managers in the foreground

2015-12-02 Thread Brian Chhun
Yep, I think this makes sense. I'm currently patching the flink-daemon.sh script to remove the `&`, but I don't think it's a very robust solution, particularly when this script changes across versions of Flink. I'm very new to Docker, but the resources I've found indicates that the process must

Re: Including option for starting job and task managers in the foreground

2015-12-02 Thread Till Rohrmann
Hi Brian, as far as I know this is at the moment not possible with our scripts. However it should be relatively easy to add by simply executing the Java command in flink-daemon.sh in the foreground. Do you want to add this? Cheers, Till On Dec 1, 2015 9:40 PM, "Brian Chhun"

Re: Question about flink message processing guarantee

2015-12-02 Thread Till Rohrmann
Just a small addition. Your sources have to be replayable to some extent. With replayable I mean that they can continue from some kind of offset. Otherwise the check pointing won't help you. The Kafka source supports that for example. Cheers, Till On Dec 1, 2015 11:55 PM, "Márton Balassi"

Re: Data Stream union error after upgrading from 0.9 to 0.10.1

2015-12-02 Thread Maximilian Michels
Hi Welly, We still have to decide on the next release date but I would expect Flink 0.10.2 within the next weeks. If you can't work around the union limitation, you may build your own Flink either from the master or the release-0.10 branch which will eventually be Flink 0.10.2. Cheers, Max On

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Vieru, Mihail
Thank you, Robert! The issue with Kafka is now solved with the 0.10-SNAPSHOT dependency. We have run into an OutOfMemory exception though, which appears to be related to the state. As my colleague, Javier Lopez, mentioned in a previous thread, state handling is crucial for our use case. And as

Re: Including option for starting job and task managers in the foreground

2015-12-02 Thread Maximilian Michels
Hi Brian, I don't recall Docker requires commands to run in the foreground. Still, if that is your requirement, simply remove the "&" at the end of this line in flink-daemon.sh: $JAVA_RUN $JVM_ARGS ${FLINK_ENV_JAVA_OPTS} "${log_setting[@]}" -classpath "`manglePathList

Re: Working with protobuf wrappers

2015-12-02 Thread Krzysztof Zarzycki
Thanks guys for your answers, that is exactly information I was looking for. Krzysztof 2015-12-01 19:22 GMT+01:00 Robert Metzger : > Hi Flavio, > > 1. you don't have to register serializers if its working for you. I would > add a custom serializer if its not working or if

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Aljoscha Krettek
Hi Mihail, could you please give some information about the number of keys that you are expecting in the data and how big the elements are that you are processing in the window. Also, are there any other operations that could be taxing on Memory. I think the different exception you see for

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Robert Metzger
Its good news that the issue has been resolved. Regarding the OOM, did you start Flink in the streaming mode? On Wed, Dec 2, 2015 at 10:18 AM, Vieru, Mihail wrote: > Thank you, Robert! The issue with Kafka is now solved with the > 0.10-SNAPSHOT dependency. > > We have

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Vieru, Mihail
Yes, with the "start-cluster-streaming.sh" script. If the TaskManager gets 5GB of heap it manages to process ~100 million messages and then throws the above OOM. If it gets only 500MB it manages to process ~8 million and a somewhat misleading exception is thrown: 12/01/2015 19:14:07Source:

Re: Question about flink message processing guarantee

2015-12-02 Thread Stephan Ewen
There is an overview of what guarantees what sources can give you: https://ci.apache.org/projects/flink/flink-docs-master/apis/fault_tolerance.html#fault-tolerance-guarantees-of-data-sources-and-sinks On Wed, Dec 2, 2015 at 9:19 AM, Till Rohrmann wrote: > Just a small

Re: Running WebClient from Windows

2015-12-02 Thread Welly Tambunan
Hi Fabian, I have already created JIRA for this one. https://issues.apache.org/jira/browse/FLINK-3099 Thanks a lot for this. Cheers On Wed, Dec 2, 2015 at 6:02 PM, Fabian Hueske wrote: > Hi Welly, > > at the moment we only provide native Windows .bat scripts for

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Stephan Ewen
Mihail! The Flink windows are currently in-memory only. There are plans to relax that, but for the time being, having enough memory in the cluster is important. @Gyula: I think window state is currently also limited when using the SqlStateBackend, by the size of a row in the database (because

Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Maximilian Michels
Hi Niels, Sorry for hear you experienced this exception. From a first glance, it looks like a bug in Hadoop to me. > "Not retrying because the invoked method is not idempotent, and unable to > determine whether it was invoked" That is nothing to worry about. This is Hadoop's internal retry

Re: Iterative queries on Flink

2015-12-02 Thread Maximilian Michels
Hi Flavio, I was working on this some time ago but it didn't make it in yet and priorities shifted a bit. The pull request is here: https://github.com/apache/flink/pull/640 The basic idea is to remove Flink's ResultPartition buffers in memory lazily, i.e. keep them as long as enough memory is