Consuming a Kafka topic with multiple partitions from Flink

2017-08-28 Thread Isuru Suriarachchi
Hi all, I'm trying to implement a Flink consumer which consumes a Kafka topic with 3 partitions. I've set the parallelism of the execution environment to 3 as I want to make sure that each Kafka partition is consumed by a separate parallel task in Flink. My first question is whether it's always

Re: Flink Elastic Sink AWS ES

2017-08-28 Thread arpit srivastava
It seems AWS ES setup is hiding the nodes ip. Then I think you can try @vinay patil's solution. Thanks, Arpit On Tue, Aug 29, 2017 at 3:56 AM, ant burton wrote: > Hey Arpit, > > _cat/nodes?v=ip,port > > > returns the following which I have not added the x’s they were

Re: Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Sridhar Chellappa
OK. I got past the problem. Basically, I had to change public class MyKafkaMessageSerDeSchema implements DeserializationSchema, SerializationSchema { @Override public MyKafkaMessage deserialize(byte[] message) throws IOException { MyKafkaMessage MyKafkaMessage = null; try

Re: Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Ted Yu
The NPE came from this line: StreamRecord copy = castRecord.copy(serializer.copy(castRecord.getValue())); Either serializer or castRecord was null. I wonder if this has been fixed in 1.3.2 release. On Mon, Aug 28, 2017 at 7:24 PM, Sridhar Chellappa wrote: >

Re: Question about watermark and window

2017-08-28 Thread Tony Wei
Hi Alijoscha, It is very helpful to me to understand the behavior on such scenario. Thank you very much!!! Best Regards, Tony Wei 2017-08-28 20:00 GMT+08:00 Aljoscha Krettek : > Hi Tony, > > I think your analyses are correct. Especially, yes, if you re-read the > data the

Re: Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Sridhar Chellappa
Kafka Version is 0.10.0 On Tue, Aug 29, 2017 at 6:43 AM, Sridhar Chellappa wrote: > 1.3.0 > > On Mon, Aug 28, 2017 at 10:09 PM, Ted Yu wrote: > >> Which Flink version are you using (so that line numbers can be matched >> with source code) ? >> >> On

Re: Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Sridhar Chellappa
1.3.0 On Mon, Aug 28, 2017 at 10:09 PM, Ted Yu wrote: > Which Flink version are you using (so that line numbers can be matched > with source code) ? > > On Mon, Aug 28, 2017 at 9:16 AM, Sridhar Chellappa > wrote: > >> DataStream

Re: Example build error

2017-08-28 Thread Ted Yu
Looking at: https://github.com/kl0u/flink-examples/blob/master/src/main/java/com/dataartisans/flinksolo/simple/StreamingWordCount.java there is no line 56. Which repo do you get StreamingWordCount from ? On Mon, Aug 28, 2017 at 3:58 PM, Jakes John wrote: > When I am

Example build error

2017-08-28 Thread Jakes John
When I am trying to build and run streaming wordcount example(example in the flink github), I am getting the following error StreamingWordCount.java:[56,59] incompatible types: org.apache.flink.api.java.operators.DataSource cannot be converted to

Re: Flink Elastic Sink AWS ES

2017-08-28 Thread ant burton
Hey Arpit, > _cat/nodes?v=ip,port returns the following which I have not added the x’s they were returned on the response ipport x.x.x.x 9300 Thanks your for you help Anthony > On 28 Aug 2017, at 10:34, arpit srivastava wrote: > > Hi Ant, > > Can you try

Union limit

2017-08-28 Thread boci
Hi guys! I have one input (from mongo) and I split the incoming data to multiple datasets (each created dynamically from configuration) and before I write back the result I want to merge it to one dataset (there is some common transformation). so the flow: DataSet from Mongod => Create Mappers

Default chaining & uid

2017-08-28 Thread Emily McMahon
Does setting uid affect the default chaining (ie if I have two maps in a row and set uid on both)? This makes me think there's no effect All operators that are part of a chain should be assigned an ID as > described in

Re: Even out the number of generated windows

2017-08-28 Thread Bowen Li
That's exactly what I found yesterday! Thank you Aljoscha for confirming it! On Mon, Aug 28, 2017 at 2:57 AM, Aljoscha Krettek wrote: > Hi Bowen, > > There is not built-in TTL but you can use a ProcessFunction to set a timer > that clears state. > > ProcessFunction docs:

Re: Issues in recovering state from last crash using custom sink

2017-08-28 Thread vipul singh
Hi Aljoscha, Yes. I am running the application till a few checkpoints are complete. I am stopping the application between two checkpoints, so there will be messages in the list state, which should be checkpointed when *snapshot* is called. I am able to see a checkpoint file on S3( I am saving the

Flink Yarn Session failures

2017-08-28 Thread Chan, Regina
Hi, Was trying to understand why it takes about 9 minutes between the last try to start a container and when it finally gets the sigterm to kill the YarnApplicationMasterRunner. Client: Calc Engine: 2017-08-28 12:39:23,596 INFO org.apache.flink.yarn.YarnClusterClient

metrics for Flink sinks

2017-08-28 Thread Martin Eden
Hi all, Just 3 quick questions both related to Flink metrics, especially around sinks: 1. In the Flink UI Sources always have 0 input records / bytes and Sinks always have 0 output records / bytes? Why is it like that? 2. What is the best practice for instrumenting off the shelf Flink sinks?

Re: Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Ted Yu
Which Flink version are you using (so that line numbers can be matched with source code) ? On Mon, Aug 28, 2017 at 9:16 AM, Sridhar Chellappa wrote: > DataStream MyKafkaMessageDataStream = env.addSource( > getStreamSource(env, parameterTool); > ); >

Re: Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Sridhar Chellappa
DataStream MyKafkaMessageDataStream = env.addSource( getStreamSource(env, parameterTool); ); public RichParallelSourceFunction getStreamSource(StreamExecutionEnvironment env, ParameterTool parameterTool) { // MyKAfkaMessage is a ProtoBuf message

Re: Sink - Cassandra

2017-08-28 Thread nragon
Nick, Can you send some of your examples using phoenix? Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Sink-Cassandra-tp4107p15197.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

CoGroupedStreams.WithWindow sideOutputLateData and allowedLateness

2017-08-28 Thread Yunus Olgun
Hi, WindowedStream has sideOutputLateData and allowedLateness methods to handle late data. A similar functionality at CoGroupedStreams would have been nice. As it is, it silently ignores late data and it is error-prone. - Is there a reason it does not exist? - Any suggested workaround?

Off heap memory issue

2017-08-28 Thread Javier Lopez
Hi all, we are starting a lot of Flink jobs (streaming), and after we have started 200 or more jobs we see that the non-heap memory in the taskmanagers increases a lot, to the point of killing the instances. We found out that every time we start a new job, the committed non-heap memory increases

Re: Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Ted Yu
Which version of Flink / Kafka are you using ? Can you show the snippet of code where you create the DataStream ? Cheers On Mon, Aug 28, 2017 at 7:38 AM, Sridhar Chellappa wrote: > Folks, > > I have a KafkaConsumer that I am trying to read messages from. When I try > to

Null Pointer Exception on Trying to read a message from Kafka

2017-08-28 Thread Sridhar Chellappa
Folks, I have a KafkaConsumer that I am trying to read messages from. When I try to create a DataStream from the KafkConsumer (env.addSource()) I get the following exception : Any idea on how can this happen? java.lang.NullPointerException at

Classloader issue with UDF's in DataStreamSource

2017-08-28 Thread Edward
I need help debugging a problem with using user defined functions in my DataStreamSource code. Here's the behavior: The first time I upload my jar to the Flink cluster and submit the job, it runs fine. For any subsequent runs of the same job, it's giving me a NoClassDefFound error on one of my

Re: Question about windowing

2017-08-28 Thread Aljoscha Krettek
Yes, this is a very good explanation, Tony! I'd like to add that "Evictor" is not really a good name for what it does. It should be more like "Keeper" or "Retainer" because what a "CountEvictor.of(1000)" really does is to evict everything but the last 1000 elements, so it should be called

Re: Database connection from job

2017-08-28 Thread Aljoscha Krettek
Hi Bart, I think you might be interested in the (admittedly short) section of the doc about RichFunctions: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/api_concepts.html#rich-functions

Re: Question about watermark and window

2017-08-28 Thread Aljoscha Krettek
Hi Tony, I think your analyses are correct. Especially, yes, if you re-read the data the (ts=3) data should still be considered late if both consumers read with the same speed. If, however, (ts=3) is read before the other consumer reads (ts=8) then it should not be considered late, as you

Re: Thoughts - Monitoring & Alerting if a Running Flink job ever kills

2017-08-28 Thread Aljoscha Krettek
Hi, There is no built-in feature for this but you would use your metrics system for that, in my opinion. Best, Aljoscha > On 26. Aug 2017, at 00:49, Raja.Aravapalli wrote: > > Hi, > > Is there a way to set alerting when a running Flink job kills, due to any >

Re: [Error]TaskManager -RECEIVED SIGNAL 1: SIGHUP. Shutting down as requested

2017-08-28 Thread Ted Yu
See http://docs.oracle.com/cd/E19253-01/816-5166/6mbb1kq04/index.html Cheers On Sun, Aug 27, 2017 at 11:47 PM, Samim Ahmed wrote: > Hello Ted Yu, > > Thanks for your response and a sincere apology for let reply. > > OS version : Solaris10. > Flink Version :

Re: Specific sink behaviour based on tuple key

2017-08-28 Thread Aljoscha Krettek
Hi, The Key is not available directly to a user function? You would have to use within that function the same code that you use for your KeySelector. Best, Aljoscha > On 26. Aug 2017, at 10:01, Alexis Gendronneau wrote: > > Hi all, > > I am looking to customize a

Re: Even out the number of generated windows

2017-08-28 Thread Aljoscha Krettek
Hi Bowen, There is not built-in TTL but you can use a ProcessFunction to set a timer that clears state. ProcessFunction docs: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html Best, Aljoscha > On 27. Aug 2017, at 19:19, Bowen Li

Re: Issues in recovering state from last crash using custom sink

2017-08-28 Thread Aljoscha Krettek
Hi, How are you testing the recovery behaviour? Are you taking a savepoint ,then shutting down, and then restarting the Job from the savepoint? Best, Aljoscha > On 28. Aug 2017, at 00:28, vipul singh wrote: > > Hi all, > > I am working on a flink archiver application. In

Re: Flink Elastic Sink AWS ES

2017-08-28 Thread arpit srivastava
Hi Ant, Can you try this. curl -XGET 'http:///_cat/nodes?v=ip,port' This should give you ip and port On Mon, Aug 28, 2017 at 3:42 AM, ant burton wrote: > Hi Arpit, > > The response fromm _nodes doesn’t contain an ip address in my case. Is > this something that you

Re: The implementation of the RichSinkFunction is not serializable.

2017-08-28 Thread Federico D'Ambrosio
Hello everyone, I solved my issue by using an Array[Byte] as a parameter, instead of the explicit HTableDescriptor parameter. This way I can instantiate the TableDescriptor inside the open method of OutputFormat using the static method HTableDescriptor.parseFrom. In the end, marking conf, table