Re: kafka consumer client seems not auto commit offset

2017-11-15 Thread Tzu-Li (Gordon) Tai
Hi Tony, Thanks for the report. At first glance of the description, what you described doesn’t seem to match the expected behavior. I’ll spend some time later today to check this out. Cheers, Gordon On 15 November 2017 at 5:08:34 PM, Tony Wei (tony19920...@gmail.com) wrote: Hi Gordon, When

Re: What happened if my parallelism more than kafka partitions.

2017-11-08 Thread Tzu-Li (Gordon) Tai
The `KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks)` method returns the index of the target subtask for a given Kafka partition. The implementation in that method ensures that the same subtask index will always be returned for the same partition. Each consumer subtask will

Re: Kafka Consumer fetch-size/rate and Producer queue timeout

2017-11-08 Thread Tzu-Li (Gordon) Tai
Hi Ashish, From your description I do not yet have much of an idea of what may be happening. However, some of your observations seems reasonable. I’ll go through them one by one: I did try to modify request.timeout.ms, linger.ms etc to help with the issue if it were caused by a sudden burst

Re: What happened if my parallelism more than kafka partitions.

2017-11-08 Thread Tzu-Li (Gordon) Tai
Hi! You can set the parallelism of the Flink Kafka Consumer independent of the number of partitions. If there are more consumer subtasks than the number of Kafka partitions to read (i.e. when the parallelism of the consumer is set higher than the number of partitions), some subtasks will

Re: Flink Streaming example: Kafka010Example.scala doesn't work

2017-10-20 Thread Tzu-Li (Gordon) Tai
instance. Have a good day, cheers! Michał On Thu, Oct 19, 2017 at 5:22 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Michal, I can’t seem to access the link you provided for the logs. As for confirming whether or not some data was read / written, how exactly did yo

Re: Flink CEP State Change Pattern

2017-10-19 Thread Tzu-Li (Gordon) Tai
Hi Philip! I’m looping in Kostas to this thread. He might be able to provide some insights for your question. Cheers, Gordon On 14 October 2017 at 8:54:45 PM, Philip Limbeck (philiplimb...@gmail.com) wrote: Hi! I am quite new to Flink CEP and try to define a state change pattern with

Re: Watermark on connected stream

2017-10-19 Thread Tzu-Li (Gordon) Tai
Ah, sorry for the duplicate answer, I didn’t see Piotr’s reply. Slight delay on the mail client. On 19 October 2017 at 11:05:01 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Kien, The watermark of an operator with multiple inputs will be determined by the current minimum watermark

Re: Accumulator with Elasticsearch Sink

2017-10-19 Thread Tzu-Li (Gordon) Tai
Hi Sendoh, That sounds like a reasonable metric to add directly to the Elasticsearch connector. Could you perhaps write a comment on that in  https://issues.apache.org/jira/browse/FLINK-7697? Cheers, Gordon On 19 October 2017 at 8:57:23 PM, Sendoh (unicorn.bana...@gmail.com) wrote: Hi Flink

Re: Watermark on connected stream

2017-10-19 Thread Tzu-Li (Gordon) Tai
Hi Kien, The watermark of an operator with multiple inputs will be determined by the current minimum watermark across all inputs. Cheers, Gordon On 19 October 2017 at 8:06:11 PM, Kien Truong (duckientru...@gmail.com) wrote: Hi,  If I connect two stream with different watermark, how are the

Re: Savepoints and migrating value state data types

2017-10-06 Thread Tzu-Li (Gordon) Tai
Hi, Yes, the AvroSerializer currently partially still uses Kryo for object copying. Also, right now, I think the AvroSerializer is only used when the type is recognized as a POJO, and that `isForceAvroEnabled` is set on the job configuration. I’m not sure if that is always possible. As

Re: Fwd: Consult about flink on mesos cluster

2017-10-06 Thread Tzu-Li (Gordon) Tai
Hi Bo, I'm not familiar with Mesos deployments, but I'll forward this to Till or Eron (in CC) who perhaps could provide some help here. Cheers, Gordon On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1...@gmail.com) wrote: Hello all, This is Bo, I met some problems when I tried to use flink in my

Re: ArrayIndexOutOfBoundExceptions while processing valve output watermark and while applying ReduceFunction in reducing state

2017-09-30 Thread Tzu-Li (Gordon) Tai
apply(new MongoWindow(MongoWritingType.UPDATE)).name("history_stream_window").uid("history_stream_window") Regards, Federico 2017-09-29 15:38 GMT+02:00 Tzu-Li (Gordon) Tai <tzuli...@apache.org>: Hi, I’m looking into this. Could you let us know the Flink ver

Re: ArrayIndexOutOfBoundExceptions while processing valve output watermark and while applying ReduceFunction in reducing state

2017-09-29 Thread Tzu-Li (Gordon) Tai
Hi, I’m looking into this. Could you let us know the Flink version in which the exceptions occurred? Cheers, Gordon On 29 September 2017 at 3:11:30 PM, Federico D'Ambrosio (federico.dambro...@smartlab.ws) wrote: Hi, I'm coming across these Exceptions while running a pretty simple flink job.

Re: Kinesis connector - Jackson issue

2017-09-26 Thread Tzu-Li (Gordon) Tai
ector is that it excludes jackson: [INFO] Excluding com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.7.3 from the shaded jar. even though I can't find any mention of that in it's pom.xml. Cheers, Tomasz On 26 September 2017 at 15:43, Tzu-Li (Gordon) Tai <tzuli...@apache.or

Re: Kinesis connector - Jackson issue

2017-09-26 Thread Tzu-Li (Gordon) Tai
Hi Tomasz, Yes, dependency clashes may surface when executing actual job runs on clusters. A few things to probably check first: - Have you built Flink or the Kinesis connector with Maven version 3.3 or above? If yes, try using a lower version, as 3.3+ results in some shading issues when used

Re: Regarding flink-cassandra-connectors

2017-09-26 Thread Tzu-Li (Gordon) Tai
few more queries on the same lines, if I have to perform fetch i.e. select queries, I have to go for the batch queries, no streaming support is available. Regards, Jagadisha G On Tue, Sep 26, 2017 at 3:40 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Jagadish, Yes, you are

Re: Regarding flink-cassandra-connectors

2017-09-26 Thread Tzu-Li (Gordon) Tai
Ah, sorry I just realized Till also answered your question on your cross-post at dev@. It’s usually fine to post questions to just a single mailing list :) On 26 September 2017 at 12:10:55 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Jagadish, Yes, you are right that the Flink

Re: Regarding flink-cassandra-connectors

2017-09-26 Thread Tzu-Li (Gordon) Tai
Hi Jagadish, Yes, you are right that the Flink Cassandra connector uses the Datastax drivers internally, which is also the case for all the other Flink connectors; e.g., the Kafka connector uses the Kafka Java client, Elasticearch connector uses the ES Java client, etc. The main advantage

RE: Flink kafka consumer that read from two partitions in local mode

2017-09-26 Thread Tzu-Li (Gordon) Tai
and regards, Tovi   From: Sofer, Tovi [ICG-IT] Sent: יום ב 25 ספטמבר 2017 17:18 To: 'Tzu-Li (Gordon) Tai'; Fabian Hueske Cc: user Subject: RE: Flink kafka consumer that read from two partitions in local mode   Hi Gordon,   Thanks for your assistance.   · We are running flink

RE: Flink kafka consumer that read from two partitions in local mode

2017-09-25 Thread Tzu-Li (Gordon) Tai
n, am I missing something in consumer setup? Should I configure consumer in some way to subscribe to two partitions?   Thanks and regards, Tovi   From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: יום ג 19 ספטמבר 2017 22:58 To: Sofer, Tovi [ICG-IT] Cc: user; Tzu-Li (Gordon) Tai Subjec

Re: StreamCorruptedException

2017-09-25 Thread Tzu-Li (Gordon) Tai
.java:897)     ... 5 more So, it looks like the Job Manager ran out of memory, thanks to the "Progressively Getting Worse" checkpoints. Any ideas on how to make sure the checkpoints faster?   On Thu, Sep 21, 2017 at 7:29 PM,

Re: LocalStreamEnvironment - configuration doesn't seem to be used in RichFunction operators

2017-09-23 Thread Tzu-Li (Gordon) Tai
ion - now just to explore and find some better ways to test this stuff! On Fri, Sep 22, 2017 at 11:51 AM Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi, The passing in of a Configuration instance in the open method is actually a leftover artifact of the DataStream API that remains on

Re: LocalStreamEnvironment - configuration doesn't seem to be used in RichFunction operators

2017-09-22 Thread Tzu-Li (Gordon) Tai
Hi, The passing in of a Configuration instance in the open method is actually a leftover artifact of the DataStream API that remains only due to API backwards compatibility reasons. There’s actually no way to modify what configuration is retrieved there (and it is actually always a new empty

Re: Clarifications on FLINK-KAFKA consumer

2017-09-22 Thread Tzu-Li (Gordon) Tai
Hi Rahul! 1. Will FLink-Kafka consumer 0.8x run on multiple task slots or a single task slot? Basically I mean if its going to be a parallel operation or a non parallel operation? Yes, the FlinkKafkaConsumer is a parallel consumer. 2. If its a parallel operation, then do multiple task slots

Re: StreamCorruptedException

2017-09-21 Thread Tzu-Li (Gordon) Tai
Hi Sridhar, Sorry that this didn't get a response earlier. According to the trace, it seems like the job failed during the process, and when trying to automatically restore from a checkpoint, deserialization of a CEP `IterativeCondition` object failed. As far as I can tell, CEP operators are

Re: Savepoints and migrating value state data types

2017-09-21 Thread Tzu-Li (Gordon) Tai
Hi! The exception that you have bumped into indicates that on the restore of the savepoint, the serializer for that registered state in the savepoint no longer exists. This prevents restoring savepoints taken with memory state backends because there will be no serializer available to deserialize

Re: Custom Serializers

2017-09-15 Thread Tzu-Li (Gordon) Tai
Hi Nuno, Because of this, we have a legacy structure that I showed before.  Could you probably include more information about this legacy structure you mentioned here in this mail thread? I couldn’t find any other reference to that. That could be helpful to understanding your use case more

Re: Change Kafka cluster without loosing the Flink's state

2017-09-14 Thread Tzu-Li (Gordon) Tai
Simply like this: env.addSource(new FlinkKafkaConsumer(...)).uid(“some-unique-id”) The same goes for any other operator. However, do keep in mind this bug that was just recently uncovered:  https://issues.apache.org/jira/browse/FLINK-7623. What I described in my previous reply would not work as

Re: Change Kafka cluster without loosing the Flink's state

2017-09-14 Thread Tzu-Li (Gordon) Tai
Hi Konstantin, After migrating and connecting to the new Kafka cluster, do you want the Kafka consumer to start fresh without any partition offset state (and therefore will re-establish its partition-to-subtask assignments), while keeping all other operator state in the pipeline intact? If so,

Re: Using latency markers

2017-09-13 Thread Tzu-Li (Gordon) Tai
Hi Aitozi, Yes, I think we haven’t really pin-pointed out the actual cause of the problem, but if you have a fix for that and can provide a PR we can definitely look at it! That would be helpful. Before opening a PR, also make sure to first open a JIRA for the issue (I don’t think there is one

Re: LatencyMarker

2017-09-13 Thread Tzu-Li (Gordon) Tai
end latency and then i can see the latency in dashboard, if the community accept the patch Thanks. and now the Tzu-Li (Gordon) Tai wrote > Hi! > > Yes, backpressure should also increase the latency value calculated from > LatencyMarkers. > LatencyMarkers are specia

Re: Exception when using keyby operator

2017-09-13 Thread Tzu-Li (Gordon) Tai
Following up: here’s the JIRA ticket for improving the POJO data type documentation - https://issues.apache.org/jira/browse/FLINK-7614. - Gordon On 11 September 2017 at 10:31:23 AM, Sridhar Chellappa (flinken...@gmail.com) wrote: That fixed my issue. Thanks. I also agree we need to fix the

Re: BucketingSink never closed

2017-09-13 Thread Tzu-Li (Gordon) Tai
Ah, sorry, one correction. Just realized there’s already some analysis of the BucketingSink closing issue in this mail thread. Please ignore my request for relevant logs :) On 13 September 2017 at 10:56:10 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Flavio, Let me try

Re: BucketingSink never closed

2017-09-13 Thread Tzu-Li (Gordon) Tai
Hi Flavio, Let me try to understand / look at some of the problems you have encountered. checkpointing: it's not clear which checkpointing system to use and how to tune/monitor it and avoid OOM exceptions. What do you mean be which "checkpointing system” to use? Do you mean state backends?

Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Tzu-Li (Gordon) Tai
. Thanks, Kant On Thu, Sep 7, 2017 at 1:04 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi! I am wondering if Flink can do streaming from data sources other than Kafka. For example can Flink do streaming from a database like Cassandra, HBase, MongoDb to sinks like says Elastic search or

Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Tzu-Li (Gordon) Tai
Hi! I am wondering if Flink can do streaming from data sources other than Kafka. For example can Flink do streaming from a database like Cassandra, HBase, MongoDb to sinks like says Elastic search or Kafka. Yes, Flink currently supports various connectors for different sources and sinks. For

Re: Process Function

2017-09-05 Thread Tzu-Li (Gordon) Tai
Hi Navneeth, Currently, I don't think there is any built-in functionality to trigger onTimer periodically. As for the second part of your question, do you mean that you want to query on which key the fired timer was registered from? I think this also isn't possible right now. I'm looping in

Re: LatencyMarker

2017-09-05 Thread Tzu-Li (Gordon) Tai
Hi! Yes, backpressure should also increase the latency value calculated from LatencyMarkers. LatencyMarkers are special events that flow along with the actual stream records, so they should also be affected by backpressure. Are you asking because you observed otherwise? Cheers, Gordon --

Re: State Maintenance

2017-09-05 Thread Tzu-Li (Gordon) Tai
Hi Navneeth, Answering your three questions separately: 1. Yes. Your MapState will be backed by RocksDB, so when removing an entry from the map state, the state will be removed from the local RocksDB as well. 2. If state classes are not POJOs, they will be serialized by Kryo, unless a custom

Re: How to flush all window states after Kafka (0.10.x) topic was removed

2017-09-05 Thread Tzu-Li (Gordon) Tai
Hi Tony, Currently, the functionality that you described does not exist in the consumer. When a topic is deleted, as far as I know, the consumer would simply consider the partitions as unreachable and continue to try fetching records from them until they are up again. I'm not entirely sure if a

Re: Avro Serialization and RocksDB Internal State

2017-08-18 Thread Tzu-Li (Gordon) Tai
Hi Biplob, Yes, your assumptions are correct [1]. To be a bit more exact, the `AvroSerializer` will be used to serialize your POJO data types. That would be the case for data transfers and state serialization (unless for state serialization you specify a custom state serializer; see [2]) If

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-17 Thread Tzu-Li (Gordon) Tai
Hi Raja, Can you please confirm if I have to use the below settings to ensure I use keytabs?   security.kerberos.login.use-ticket-cache: Indicates whether to read from your Kerberos ticket cache (default: true).   security.kerberos.login.keytab: Absolute path to a Kerberos keytab file that

Re: Efficient grouping and parallelism on skewed data

2017-08-17 Thread Tzu-Li (Gordon) Tai
Hi John, Do you need to do any sort of grouping on the keys and aggregation? Or are you simply using Flink to route the Kafka messages to different Elasticsearch indices? For the following I’m assuming the latter: If there’s no need for aggregate computation per key, what you can do is simply

Re:Re:Re:Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-17 Thread Tzu-Li (Gordon) Tai
..@gmail.com> wrote: Did you use image for the code ? Can you send plain code again ? Cheers Original message From: mingleizhang <18717838...@163.com> Date: 8/16/17 6:16 PM (GMT-08:00) To: mingleizhang <18717838...@163.com> Cc: "Tzu-Li (Gordon) Tai" <t

Re: Error during Kafka connection

2017-08-14 Thread Tzu-Li (Gordon) Tai
Hi, I don’t have experience running Kafka clusters behind proxies, but it seems like the configurations “advertised.host.name” and “advertised.port” for your Kafka brokers are what you’re looking for. For information on that please refer to the Kafka documentations. Cheers, Gordon On 12

Re: Error during Kafka connection

2017-08-11 Thread Tzu-Li (Gordon) Tai
No, there should be no difference between setting it up on Ubuntu or OS X. I can’t really tell any anything suspicious from the information provided so far, unfortunately. Perhaps you can try first checking that the Kafka topic is consumable from where you’re running Flink, e.g. using the

Re: Aggregation based on Timestamp

2017-08-11 Thread Tzu-Li (Gordon) Tai
Hi, Yes, this is definitely doable in Flink, and should be very straightforward. Basically, what you would do is define a FlinkKafkaConsumer source for your Kafka topic [1], following that a keyBy operation on the hostname [2], and then a 1-minute time window aggregation [3]. At the end of

Re: Error during Kafka connection

2017-08-11 Thread Tzu-Li (Gordon) Tai
Hi, AFAIK, Kafka group coordinators are supposed to always be marked dead, because we use static assignment internally and therefore Kafka's group coordination functionality is disabled. Though it may be obvious, but to get that out of the way first: are you sure that the Kafka installation

Re: load + update global state

2017-08-07 Thread Tzu-Li (Gordon) Tai
Hi Peter! One thing I’d like to understand first after reading about your use case: Why exactly do you need the lookup table to be globally accessible? From what I understand, you are using this lookup table for stream event enriching, so whatever processing you need to perform downstream on

Re: Flink streaming Parallelism

2017-08-07 Thread Tzu-Li (Gordon) Tai
Hi, The equivalent would be setting a parallelism on your sink operator. e.g. stream.addSink(…).setParallelism(…). By default the parallelism of all operators in the pipeline will be whatever parallelism was set for the whole job, unless parallelism is explicitly set for a specific operator.

Re: KafkaConsumerBase

2017-08-02 Thread Tzu-Li (Gordon) Tai
Hi! method shown in KafkaConsumerBase.java (version 1.2.0)  A lot has changed in the FlinkKafkaConsumerBase since version 1.2.0. And if I remember correctly, the `assignPartitions` method was actually a no longer relevant method used in the code, and was properly removed afterwards. The method

Re: multiple users per flink deployment

2017-08-02 Thread Tzu-Li (Gordon) Tai
Hi, There’s been quite a few requests on this recently on the mailing lists and also mentioned by some users offline, so I think we may need to start with plans to probably support this. I’m CC’ing Eron to this thread to see if he has any thoughts on this, as he was among the first authors

Re: About KafkaConsumerBase

2017-08-01 Thread Tzu-Li (Gordon) Tai
Hi, it maintain itself a individual instance of  FlinkKafkaConsumerBase, and it contains a individual pendingOffsetsToCommit  , am right ?  That is correct! The FlinkKafkaConsumerBase is code executed for each parallel subtask instance, and therefore have their own pendingOffsetsToCommit which

Re: Fink: KafkaProducer Data Loss

2017-07-31 Thread Tzu-Li (Gordon) Tai
Hi! Thanks a lot for providing this. I'll try to find some time this week to look into this using your example code. Cheers, Gordon On 29 July 2017 at 4:46:57 AM, ninad (nni...@gmail.com) wrote: Hi Gordon, I was able to reproduce the data loss on standalone flink cluster also. I have stripped

Re: write into hdfs using avro

2017-07-27 Thread Tzu-Li (Gordon) Tai
Hi! Yes, you can provide a custom writer for the BucketingSink via BucketingSink#setWriter(…). The AvroKeyValueSinkWriter is a simple example of a writer that uses Avro for serialization, and takes as input KV 2-tuples. If you want to have a writer that takes as input your own event types,

Re: FlinkKafkaConsumer010 - Memory Issue

2017-07-25 Thread Tzu-Li (Gordon) Tai
Hi, I couldn’t seem to reproduce this. Taking another look at your description, one thing I spotted was that your Kafka broker installation versions are 0.10.1.0, while the Kafka consumer uses Kafka clients of version 0.10.0.1 (by default, as shown in your logs). I’m wondering whether or not

Re: FlinkKafkaConsumer subscribes to partitions in restoredState only.

2017-07-24 Thread Tzu-Li (Gordon) Tai
:22:10 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi, Sorry for not replying to this earlier, it seems like this thread hadn’t been noticed earlier. What you are experiencing is expected behavior. In Flink 1.3, new partitions will not be picked up, only partitions

Re: FlinkKafkaConsumer subscribes to partitions in restoredState only.

2017-07-24 Thread Tzu-Li (Gordon) Tai
Hi, Sorry for not replying to this earlier, it seems like this thread hadn’t been noticed earlier. What you are experiencing is expected behavior. In Flink 1.3, new partitions will not be picked up, only partitions that are in checkpoints state will be subscribed to on restore runs. One main

Re: cannot use ElasticsearchSink in Flink1.3.0

2017-07-20 Thread Tzu-Li (Gordon) Tai
Hi, There was an issue with release ES 5 in 1.3.0, and the artifacts were not released to Maven central. Please use 1.3.1 instead. Cheers, Gordon On 20 July 2017 at 3:31:39 PM, ZalaCheung (gzzhangdesh...@corp.netease.com) wrote: Hi all, I am using Flink 1.3.0 and following instructions

Re:Re: Problem with Flink restoring from checkpoints

2017-07-20 Thread Tzu-Li (Gordon) Tai
Hi, What Fabian mentioned is true. Flink Kafka Consumer’s exactly-once guarantee relies on offsets checkpoints as Flink state, and doesn’t rely on the committed offsets in Kafka. What we found is that Flink acks Kafka immediately before even writing to S3. What you mean by ack here is the

Re: Kafka Producer - Null Pointer Exception when processing by element

2017-07-20 Thread Tzu-Li (Gordon) Tai
your per-record lists and collects as it iterates through them. - Gordon On 18 July 2017 at 3:02:45 AM, earellano (eric.arell...@ge.com) wrote: Tzu-Li (Gordon) Tai wrote > Basically, when two operators are chained together, the output of the > first operator is immediately c

Re: Flink Elasticsearch Connector: Lucene Error message

2017-07-20 Thread Tzu-Li (Gordon) Tai
:58 GMT+02:00 Tzu-Li (Gordon) Tai <tzuli...@apache.org>: Hi, I would also recommend checking the `lib/` folder of your Flink installation to see if there is any dangling old version jars that you added there. I did a quick dependency check on the Elasticsearch 2 connector, it is correctly p

Re: FlinkKafkaConsumer010 - Memory Issue

2017-07-19 Thread Tzu-Li (Gordon) Tai
Hi Pedro, Seems like a memory leak. The only issue I’m currently aware of that may be related is [1]. Could you tell if this JIRA relates to what you are bumping into? The JIRA mentions Kafka 09, but a fix is only available for Kafka 010 once we bump our Kafka 010 dependency to the latest

Re: Does FlinkKafkaConsumer010 care about consumer group?

2017-07-19 Thread Tzu-Li (Gordon) Tai
of same consumer group, A' will receive messages from all partitions when its started from savepoint? I am using Flink 1.2.1. Does the above plan require setting uid on the Kafka source in the job? Thanks, Moiz On Wed, Jul 19, 2017 at 1:06 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>

Re: Does FlinkKafkaConsumer010 care about consumer group?

2017-07-19 Thread Tzu-Li (Gordon) Tai
Hi! The only occasions which the consumer group is used is: 1. When committing offsets back to Kafka. Since Flink 1.3, this can be disabled completely (both when checkpointing is enabled or disabled). See [1] on details about that. 2. When starting fresh (not starting from some savepoint), if

Re: Flink Elasticsearch Connector: Lucene Error message

2017-07-17 Thread Tzu-Li (Gordon) Tai
Hi, I would also recommend checking the `lib/` folder of your Flink installation to see if there is any dangling old version jars that you added there. I did a quick dependency check on the Elasticsearch 2 connector, it is correctly pulling in Lucene 5.5.0 only, so this dependency should not

Re: Kafka Producer - Null Pointer Exception when processing by element

2017-07-17 Thread Tzu-Li (Gordon) Tai
-chaining-and-resource-groups On 17 July 2017 at 2:06:52 PM, earellano (eric.arell...@ge.com) wrote: Hi, Tzu-Li (Gordon) Tai wrote > These seems odd. Are your events intended to be a list? If not, this > should be a `DataStream > > `. > > From the code snipp

Re: how to get topic names in SinkFunction when using FlinkKafkaConsumer010 with multiple topics

2017-07-16 Thread Tzu-Li (Gordon) Tai
clause for different topics in my SinkFunction, did you I need to change the way how the kafka producer to produce the message?  Any pointer to code samples will be appreciated.  Thanks Again Richard On Wednesday, July 5, 2017, 10:25:59 PM PDT, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:

Re: Kafka Producer - Null Pointer Exception when processing by element

2017-07-16 Thread Tzu-Li (Gordon) Tai
On 17 July 2017 at 2:59:38 AM, earellano [via Apache Flink User Mailing List archive.] (ml+s2336050n14294...@n4.nabble.com) wrote: Tzu-Li (Gordon) Tai wrote It seems like you’ve misunderstood how to use the FlinkKafkaProducer, or is there any specific reason why you want to emit elements to

Re: Kafka Producer - Null Pointer Exception when processing by element

2017-07-14 Thread Tzu-Li (Gordon) Tai
Hi, It seems like you’ve misunderstood how to use the FlinkKafkaProducer, or is there any specific reason why you want to emit elements to Kafka in a map function? The correct way to use it is to add it as a sink function to your pipeline, i.e. DataStream someStream = … someStream.addSink(new

Re: High back-pressure after recovering from a save point

2017-07-14 Thread Tzu-Li (Gordon) Tai
Can you try starting from the savepoint, but telling Kafka to start from the latest offset? (@gordon: Is that possible in Flink 1.3.1 or only in 1.4-SNAPSHOT ?) This is already possible in Flink 1.3.x. `FlinkKafkaConsumer#setStartFromLatest()` would be it. On 15 July 2017 at 12:33:53 AM,

Re: Fink: KafkaProducer Data Loss

2017-07-13 Thread Tzu-Li (Gordon) Tai
Hi Ninad & Piotr, AFAIK, when this issue was reported, Ninad was using 09. FLINK-6996 only affects Flink Kafka Producer 010, so I don’t think that’s the cause here. @Ninad Code to reproduce this would definitely be helpful here, thanks. If you prefer to provide that privately, that would also

Re: Kafka Connectors

2017-07-05 Thread Tzu-Li (Gordon) Tai
Maven I just included what I specified in the previous email, why flink would need others jar? And how I can track them? Cheers, Paolo [1]   https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka-0.8_2.10/1.0.0 On 5 July 2017 at 18:20, Tzu-Li (Gordon) Tai <tzuli...@apache.

Re: how to get topic names in SinkFunction when using FlinkKafkaConsumer010 with multiple topics

2017-07-05 Thread Tzu-Li (Gordon) Tai
Hi Richard, Producing to multiple topics is treated a bit differently in the Flink Kafka producer. You need to set a single default target topic, and in `KeyedSerializationSchema#getTargetTopic()` you can override the default topic with whatever is returned. The `getTargetTopic` method is

Re: Kafka Connectors

2017-07-05 Thread Tzu-Li (Gordon) Tai
Hi Paolo, Have you followed the instructions in this documentation [1]? The connectors are not part of the binary distributions, so you would need to bundle the dependencies with your code by building an uber jar. Cheers, Gordon [1] 

Re:Re: Error when set RocksDBStateBackend option in Flink?

2017-06-29 Thread Tzu-Li (Gordon) Tai
Hi! Thanks a lot for reporting this. It turns out that this is a nasty bug:  https://issues.apache.org/jira/browse/FLINK-7041. Aljoscha is working on fixing it already. It’s definitely a critical bug, so we’ll try to include in the next bugfix release. Cheers, Gordon On 29 June 2017 at 7:05:09

Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Tzu-Li (Gordon) Tai
Sorry, one typo. public AverageAccumulator merge(WindowStats a, WindowStats b) { should be: public WindowStats merge(WindowStats a, WindowStats b) { On 29 June 2017 at 8:22:34 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: I see. Then yes, a fold operation would be more efficient here

Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Tzu-Li (Gordon) Tai
window(SlidingProcessingTimeWindows.of(Time.hour(1,Time.minute(5))) .fold(new WindowStats(), newProductAggregationMapper(), newProductAggregationWindowFunction()); Thanks! On 29 June 2017 at 12:57, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Ahmad, Yes, that is correct. The aggregated fol

Re: Fwd: Incremental aggregation using Fold and failure recovery

2017-06-29 Thread Tzu-Li (Gordon) Tai
Hi Ahmad, Yes, that is correct. The aggregated fold value (i.e. your WindowStats instance) will be checkpointed by Flink as managed state, and restored from the last complete checkpoint in case of failures. One comment on using the fold function: if what you’re essentially doing in the fold is

Re: Flink 1.2.0 - Flink Job restart config not using Flink Cluster restart config.

2017-06-29 Thread Tzu-Li (Gordon) Tai
Hi Vera, Apparently, if there no job-specific restart strategy, an infinite FixedDelayRestartStrategy is always used for the job submission:

Re: New message processing time after recovery.

2017-06-28 Thread Tzu-Li (Gordon) Tai
Hi! That is correct. With processing time, all time-based operations will use the current machine system time (which would take into account). Note that with processing time, the elements don’t carry a meaningful timestamp. Best, Gordon On 28 June 2017 at 11:22:43 AM, yunfan123

Re: Partition index from partitionCustom vs getIndexOfThisSubtask downstream

2017-06-28 Thread Tzu-Li (Gordon) Tai
Hi Urs, Yes, the returned “index” from the custom partitioner refers to the parallel subtask index. I agree that the mismatching terminology used could be slightly misleading. Could you open a JIRA to improve the Javadoc for that? Thanks! Cheers, Gordon On 27 June 2017 at 10:40:47 PM, Urs

Re: Question about Checkpoint

2017-06-26 Thread Tzu-Li (Gordon) Tai
Hi Desheng, Welcome to the community! What you’re asking alludes the question: How does Flink support end-to-end (from external source to external sink, e.g. Kafka to database) exactly-once delivery? Whether or not that is supported depends on the guarantees of the source and sink and how

Re: Kinesis connector SHARD_GETRECORDS_MAX default value

2017-06-22 Thread Tzu-Li (Gordon) Tai
your opinion on this before I submit the PR. Cheers, Steffen On 24/04/2017 00:39, Tzu-Li (Gordon) Tai wrote: > Thanks for filing the JIRA! > > Would you also be up to open a PR to for the change? That would be very > very helpful :) > > Cheers, > Gordo

Re: Painful KryoException: java.lang.IndexOutOfBoundsException on Flink Batch Api scala

2017-06-22 Thread Tzu-Li (Gordon) Tai
Thanks a lot Andrea! On 21 June 2017 at 8:36:32 PM, Andrea Spina (andrea.sp...@radicalbit.io) wrote: I Gordon, sadly no news since the last message. At the end I jumped over the issue, I was not able to solve it. I'll try provide a runnable example asap. Thank you. Andrea -- View

RE: Kafka and Flink integration

2017-06-20 Thread Tzu-Li (Gordon) Tai
Yes, POJOs can contain other nested POJO types. You just have to make sure that the nested field is either public, or has a corresponding public getter- and setter- method that follows the Java beans naming conventions. On 21 June 2017 at 12:20:31 AM, nragon

RE: Kafka and Flink integration

2017-06-20 Thread Tzu-Li (Gordon) Tai
Hi Nuno, In general, if it is possible, it is recommended that you map your generic classes to Tuples / POJOs [1]. For Tuples / POJOs, Flink will create specialized serializers for them, whereas for generic classes (i.e. types which cannot be treated as POJOs) Flink simply fallbacks to using Kryo

Re: Guava version conflict

2017-06-19 Thread Tzu-Li (Gordon) Tai
Thanks a lot! Please keep me updated with this :) On 19 June 2017 at 6:33:15 PM, Flavio Pompermaier (pomperma...@okkam.it) wrote: Ok, I'll let you know as soon as I recompile Flink 1.3.x. Thanks, Flavio On Mon, Jun 19, 2017 at 7:26 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wro

Re: How can I get last successful checkpoint id in sink?

2017-06-19 Thread Tzu-Li (Gordon) Tai
Hi! The last completed checkpoint ID should be obtainable using the monitoring REST API [1], under the url “/jobs/{jobID}/checkpoints/“. It is also visible in the JobManager Web UI under the “checkpoints” tab of each job. The web UI fetches its information using the monitoring REST API, so

Re: Guava version conflict

2017-06-18 Thread Tzu-Li (Gordon) Tai
Hi Flavio, It’s most likely related to a problem with Maven. I’m pretty sure this actually isn’t a problem anymore. Could you verify by rebuilding Flink and see if the problem remains? Thanks a lot. Best, Gordon On 16 June 2017 at 6:25:10 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote

Re: Kafka and Flink integration

2017-06-16 Thread Tzu-Li (Gordon) Tai
Hi! It’s usually always recommended to register your classes with Kryo, to avoid the somewhat inefficient classname writing. Also, depending on the case, to decrease serialization overhead, nothing really beats specific custom serialization. So, you can also register specific serializers for

Re: Guava version conflict

2017-06-16 Thread Tzu-Li (Gordon) Tai
dependencies in the flink dist jar some days ago? On Fri, Jun 16, 2017 at 12:19 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Flavio, I was just doing some end-to-end rebuild Flink + cluster execution with ES sink tests, and it seems like the Guava shading problem isn’t there a

Re: Guava version conflict

2017-06-16 Thread Tzu-Li (Gordon) Tai
other dependencies in your code. Cheers, Gordon On 15 June 2017 at 8:24:48 PM, Flavio Pompermaier (pomperma...@okkam.it) wrote: Hi Gordon, any news on this? On Mon, Jun 12, 2017 at 9:54 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: This seems like a shading problem then. I’ve

Re: Flink cluster : Client is not connected to any Elasticsearch nodes!

2017-06-16 Thread Tzu-Li (Gordon) Tai
major version ES installation, this exception is very common. Best, Gordon On 6 June 2017 at 6:39:52 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Dhinesh, Could it be that you didn’t configure the network binding address of the ES installation properly? You need to make sure

Re: Painful KryoException: java.lang.IndexOutOfBoundsException on Flink Batch Api scala

2017-06-16 Thread Tzu-Li (Gordon) Tai
Hi Andrea, I’ve rallied back to this and wanted to check on the status. Have you managed to solve this in the end, or is this still a problem for you? If it’s still a problem, would you be able to provide a complete runnable example job that can reproduce the problem (ideally via a git branch

Re: dynamic add sink to flink

2017-06-12 Thread Tzu-Li (Gordon) Tai
Hi, The Flink Kafka Producer allows writing to multiple topics beside the default topic. To do this, you can override the configured default topic by implementing the `getTargetTopic` method on the `KeyedSerializationSchema`. That method is invoked for each record, and if a value is returned,

Re: Guava version conflict

2017-06-12 Thread Tzu-Li (Gordon) Tai
:( Best, Flavio  On Wed, Jun 7, 2017 at 6:21 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Yes, those should not be in the flink-dist jar, so the root reason should be that the shading isn’t working properly for your custom build. If possible, could you try building Flink again with

Re: Fink: KafkaProducer Data Loss

2017-06-11 Thread Tzu-Li (Gordon) Tai
Hi Ninad, Thanks for the logs! Just to let you know, I’ll continue to investigate this early next week. Cheers, Gordon On 8 June 2017 at 7:08:23 PM, ninad (nni...@gmail.com) wrote: I built Flink v1.3.0 with cloudera hadoop and am able to see the data loss. Here are the details:

Re: In-transit Data Encryption in EMR

2017-06-11 Thread Tzu-Li (Gordon) Tai
Hi Vinay, Apologies for the inactivity on this thread, I was occupied with some critical fixes for 1.3.1. 1. Can anyone please explain me how do you test if SSL is working correctly ? Currently I am just relying on the logs. AFAIK, if any of the SSL configuration settings are enabled

Re: Guava version conflict

2017-06-07 Thread Tzu-Li (Gordon) Tai
$ListeningDecorator.class com/google/common/util/concurrent/MoreExecutors$ScheduledListeningDecorator.class com/google/common/util/concurrent/MoreExecutors.class Is it a problem of the shading with Maven 3.3+? Best, Flavio On Wed, Jun 7, 2017 at 5:48 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote:

<    1   2   3   4   5   6   >