Re: Read from and Write to Kafka through flink

2017-04-19 Thread Tzu-Li (Gordon) Tai
Hi Pradeep, There is not single API or connector to take input as a file and writing it to Kafka. In Flink, this operation consists of 2 parts, 1) source reading from input, and 2) sink producing to Kafka. So, all you have to have a job that consists of that source and sink. You’ve already

回复:Yarn terminating TM for pmem limit cascades causing all jobs to fail

2017-04-19 Thread Zhijiang(wangzhijiang999)
Hi Shannon,    Have you tried to increase the total memory size for task manager  container?  Maybe the maximum memory requirement is beyond your current setting. And also you should check your UDF would not consume memory increasingly which  would not be recycled. If your 

Re: UnilateralSortMerger error (again)

2017-04-19 Thread Flavio Pompermaier
I could but only if there's a good probability that it fix the problem...how confident are you about it? On Wed, Apr 19, 2017 at 8:27 PM, Ted Yu wrote: > Looking at git log of DataInputDeserializer.java , there has been some > recent change. > > If you have time, maybe try

Re: Flink memory usage

2017-04-19 Thread Fabian Hueske
Hi Billy, Flink's internal operators are implemented to not allocate heap space proportional to the size of the input data. Whenever Flink needs to hold data in memory (e.g., for sorting or building a hash table) the data is serialized into managed memory. If all memory is in use, Flink starts

Re: Flink groupBy

2017-04-19 Thread Fabian Hueske
Hi Alieh, Flink uses hash partitioning to assign grouping keys to parallel tasks by default. You can implement a custom partitioner or use range partitioning (which has some overhead) to control the skew. There is no automatic load balancing happening. Best, Fabian 2017-04-19 14:42 GMT+02:00

Generate Timestamps and emit Watermarks - unordered events - Kafka source

2017-04-19 Thread Luis Lázaro
Hi everyone, i am working on a use case with CEP and Flink: Flink 1.2 Source is Kafka configured with one single partition. Data are syslog standard messages parsed as LogEntry (object with attributes like timestamp, service, severity, etc) An event is a LogEntry. If two consecutives LogEntry

Re: Yarn terminating TM for pmem limit cascades causing all jobs to fail

2017-04-19 Thread Stephan Ewen
Hi Shannon! Increasing the number of retries is definitely a good idea. The fact that you see increasing pmem use after failures / retries - let's dig into that. There are various possible leaks depending on what you use: (1) There may be a leak in class-loading (or specifically class

Flink groupBy

2017-04-19 Thread Alieh
Hi All Is there anyway in Flink to send a process to a reducer? If I do "test.groupby(1).reduceGroup", each group is processed on one reducer? And if the number of groups is more than the number of task slots we have, does Flink distribute the process evenly? I mean if we have for example

Re: Kafka offset commits

2017-04-19 Thread Tzu-Li (Gordon) Tai
Thanks for the clarification Aljoscha! Yes, you cannot restore from a 1.0 savepoint in Flink 1.2 (sorry, I missed the “1.0” part on my first reply). @Gwenhael, I’ll try to reclarify some of the questions you asked: Does that means that flink does not rely on the offset in written to zookeeper

Re: Kafka offset commits

2017-04-19 Thread Aljoscha Krettek
Hi, AFAIK, restoring a Flink 1.0 savepoint should not be possible on Flink 1.2. Only restoring from Flink 1.1 savepoints is supported. @Gordon If the consumer group stays the same the new Flink job should pick up where the old one stopped, right? Best, Aljoscha > On 18. Apr 2017, at 16:19,

Re: Flink slots, threads, task, etc

2017-04-19 Thread Flavio Pompermaier
Hi Aljoscha, thanks for the reply, it was not urgent and I was aware of the FF...btw, congratulations for it, I saw many interesting talks! Flink community has grown a lot since it was Stratosphere ;) Just one last question: in many of my use cases it could be helpful to see how many of the