Re: Kinesis Connector

2016-01-08 Thread Tzu-Li (Gordon) Tai
Hi Giancarlo, Since it has been a while since the last post and there hasn't been a JIRA ticket opened for Kinesis connector yet, I'm wondering how you are doing on the Kinesis connector and hope to help out with this feature :) I've opened a JIRA

Re: Checkpoint for exact-once proccessing

2016-01-13 Thread Tzu-Li (Gordon) Tai
Hi Francis, A part of every complete snapshot is the record positions associated with the barrier that triggered the checkpointing of this snapshot. The snapshot is completed only when all the records within the checkpoint reaches the sink. When a topology fails, all the operators' state will

Re: S3 as streaming source

2016-06-03 Thread Tzu-Li (Gordon) Tai
Hi Soumya, No, currently there is no Flink standard supported S3 streaming source. As far as I know, there isn't one out in the public yet either. The community is open to submissions for new connectors, so if you happen to be working on one for S3, you can file up a JIRA to let us know. Also,

Re: API request to submit job takes over 1hr

2016-06-13 Thread Tzu-Li (Gordon) Tai
Hi Shannon, Thanks for your investigation on the issue and the JIRA. There's actually a previous JIRA on this problem already: https://issues.apache.org/jira/browse/FLINK-4023. Would you be ok with tracking this issue on FLINK-4023, and close FLINK-4069 as a duplicate issue? As you can see, I've

Re: Flink with Yarn

2016-01-11 Thread Tzu-Li (Gordon) Tai
Hi Sourav, A little help with more clarification on your last comment. In sense of "where" the driver program is executed, then yes the Flink driver program runs in a mode similar to Spark's YARN-client. However, the "role" of the driver program and the work that it is responsible of is quite

Re: Flink DataStream and KeyBy

2016-01-13 Thread Tzu-Li (Gordon) Tai
Hi Saiph, In Flink, the key for keyBy() can be provided in different ways: https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#specifying-keys (the doc is for DataSet API, but specifying keys is basically the same for DataStream and DataSet). As described in the

Re: Local Cluster have problem with connect to elasticsearch

2016-05-11 Thread Tzu-Li (Gordon) Tai
Hi Rafal, >From your description, it seems like Flink is complaining because it cannot access the Elasticsearch API related dependencies as well. You'd also have to include the following into your Maven build, under : org.elasticsearch elasticsearch 2.3.2 jar false

Re: How to get latest offsets with FlinkKafkaConsumer

2016-08-05 Thread Tzu-Li (Gordon) Tai
Hi, Please also note that the “auto.offset.reset” property is only respected when there is no offsets under the same consumer group in ZK. So, currently, in order to make sure you read from the latest / earliest offsets every time you restart your Flink application, you’d have to use an unique

Re: Support for Auto scaling

2017-02-01 Thread Tzu-Li (Gordon) Tai
Hi Sandeep! While auto scaling jobs in Flink still isn’t possible, in Flink 1.2 you will be able to rescale jobs by stopping and restarting. This works by taking a savepoint of the job before stopping the job, and then redeploy the job with a higher / lower parallelism using the savepoint. Upon

Re: Fink: KafkaProducer Data Loss

2017-02-02 Thread Tzu-Li (Gordon) Tai
Hi Ninad and Till, Thank you for looking into the issue! This is actually a bug. Till’s suggestion is correct: The producer holds a `pendingRecords` value that is incremented on each invoke() and decremented on each callback, used to check if the producer needs to sync on pending callbacks on

Re: Apache Flink and Elasticsearch send Json Object instead of string

2017-02-22 Thread Tzu-Li (Gordon) Tai
t;: "abc"                     } } how can I treat this cases? There isn't a way to send all the json element and index it like the in the REST request? Thanks. Tzu-Li (Gordon) Tai <tzuli...@apache.org> escreveu no dia terça, 21/02/2017 às 07:54: Hi, I’ll use your code to e

Re: Serialization schema

2017-02-23 Thread Tzu-Li (Gordon) Tai
Hi Mohit, As 刘彪 pointed out in his reply, the problem is that your `Tuple2Serializer` contains fields that are not serializable, so `Tuple2Serializer` itself is not serializable. Could you perhaps share your `Tuple2Serializer` implementation with us so we can pinpoint the problem? A snippet

Re: Flink the right tool for the job ? Huge Data window lateness

2017-02-24 Thread Tzu-Li (Gordon) Tai
Hi Patrick, Thanks a lot for feedback on your use case! At a first glance, I would say that Flink can definitely solve the issues you are evaluating. I’ll try to explain them, and point you to some docs / articles that can further explain in detail: - Lateness The 7-day lateness shouldn’t be

Re: kinesis producer setCustomPartitioner use stream's own data

2017-02-20 Thread Tzu-Li (Gordon) Tai
Hi Sathi, The `getPartitionId` method is invoked with each record from the stream. In there, you can extract values / fields from the record, and use that to determine the target partition id. Is this what you had in mind? Cheers, Gordon On February 21, 2017 at 11:54:21 AM, Sathi Chowdhury

Re: A way to control redistribution of operator state?

2017-02-13 Thread Tzu-Li (Gordon) Tai
Hi Dmitry, Technically, from the looks of the internal code around `OperatorStateRepartitioner`, I think it is certainly possible to be pluggable. Right now it is just hard coded to use a round-robin repartitioner implementation as default. However, I’m not sure of the plans in exposing this

Re: #kafka/#ssl: how to enable ssl authentication for a new kafka consumer?

2017-02-14 Thread Tzu-Li (Gordon) Tai
Hi Alex, Kafka authentication and data transfer encryption using SSL can be simply done be configuring brokers and the connecting client. You can take a look at this: https://kafka.apache.org/documentation/#security_ssl. The Kafka client that the Flink connector uses can be configured through

Re: Start streaming tuples depending on another streams rate

2017-02-09 Thread Tzu-Li (Gordon) Tai
Hi Jonas, A few things to clarify first: Stream A has a rate of 100k tuples/s. After processing the whole Kafka queue, the rate drops to 10 tuples/s. From this description it seems like the job is re-reading from the beginning from the topic, and once you reach the latest record at the head

Re: Where to put "pre-start" logic and how to detect recovery?

2017-02-09 Thread Tzu-Li (Gordon) Tai
Hi Dmitry, I think currently the simplest way to do this is simply to add a program argument as a flag to whether or not the current run is from a savepoint (so you manually supply the flag whenever you’re starting the job from a savepoint), and check that flag in the main method. The main

Re: Specifying Schema dynamically

2017-02-12 Thread Tzu-Li (Gordon) Tai
Hi Luqman, From your description, it seems like that you want to infer the type (case class, tuple, etc.) of a stream dynamically at runtime. AFAIK, I don’t think this is supported in Flink. You’re required to have defined types for your DataStreams. Could you also provide an example code of

Re: There is no Open and Close method in Async I/O API of Scala

2017-02-12 Thread Tzu-Li (Gordon) Tai
Hi Howard, I don’t think there is a rich variant for Async IO in Scala yet. We should perhaps add support for it. Looped in Till who worked on the Async IO and its Scala support to clarify whether there were any concerns in not supporting it initially. Cheers, Gordon On February 13, 2017 at

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Tzu-Li (Gordon) Tai
abase from another source. Once again, thank you so much for your help. I will wait to hear from you!​ Cumprimentos, Pedro Lima Monteiro On 16 February 2017 at 09:43, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Pedro! This is definitely possible, by simply writing a Flink `SourceFunc

Re: Streaming data from MongoDB using Flink

2017-02-16 Thread Tzu-Li (Gordon) Tai
se. I will give it a try and will come back to you. Pedro Lima Monteiro On 16 February 2017 at 10:20, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: I would recommend checking out the Flink RabbitMQ Source for examples: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-

Re: "Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-16 Thread Tzu-Li (Gordon) Tai
Hi Geoffrey, Thanks for investigating and updating on this. Good to know that it is working! Just to clarify, was your series of jobs submitted to a “yarn session + regular bin/flink run”, or “per job yarn cluster”? I’m asking just to make sure of the limitations Robert mentioned. Cheers,

Re: Fault tolerance guarantees of Elasticsearch sink in flink-elasticsearch2?

2017-01-18 Thread Tzu-Li (Gordon) Tai
the affected sinks participate in checkpointing is enough of a solution - is there anything special about `SinkFunction[T]` extending `Checkpointed[S]`, or can I just implement it as I would for e.g. a mapping function? Thanks, Andrew On Jan 13, 2017, at 4:34 PM, Tzu-Li (Gordon) Tai <tz

Re: Deduplicate messages from Kafka topic

2017-01-15 Thread Tzu-Li (Gordon) Tai
Hi, You’re correct that the FlinkKafkaProducer may emit duplicates to Kafka topics, as it currently only provides at-least-once guarantees. Note that this isn’t a restriction only in the FlinkKafkaProducer, but a general restriction for Kafka's message delivery. This can definitely be improved

Re: Kafka KeyedStream source

2017-01-15 Thread Tzu-Li (Gordon) Tai
and I could filter the data more efficiently because the data would not need to go over the network before this filter. Afterwards I can scale it up to 'many' tasks for the heavier processing that follows. As a concept: Could that be made to work? Niels  On Mon, Jan 9, 2017 at 9:14 AM, Tzu-Li (G

Re: Fw: Flink Kinesis Connector

2017-02-27 Thread Tzu-Li (Gordon) Tai
Hi Matt! As mentioned in the docs, due to the ASL license, we do not deploy the artifact to the Maven central repository on Flink releases. You will need to build the Kinesis connector by yourself (the instructions to do so are also in the Flink Kinesis connector docs :)), and install it to

Re: unclear exception when writing to elasticsearch

2017-02-28 Thread Tzu-Li (Gordon) Tai
Hi! This could be a Elasticsearch server / client version conflict, or that the uber jar of your code wasn’t built properly. For the first possible issue, we’re currently using Elasticsearch 2.3.5 to build the Flink Elasticsearch Connector. Could you try overriding this version to 2.4.1 when

Re: Flink, Yarn and MapR Kerberos issue

2017-03-01 Thread Tzu-Li (Gordon) Tai
Hi Aniket, Thanks a lot for reporting this. I’m afraid this seems to be a bug with Flink on YARN’s Kerberos authentication. It is incorrectly checking for Kerberos credentials even for non-Kerberos authentication methods. I’ve filed a JIRA for this: 

Re: ElasticsearchSink Exception

2017-02-28 Thread Tzu-Li (Gordon) Tai
2.x. After changing the version it worked. Thanks for the help. On Mon, Feb 27, 2017 at 6:12 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi! Like wha Flavio suggested, at a first glance this looks like a problem with building the uber jar. I haven’t bumped into the problem

Re: unclear exception when writing to elasticsearch

2017-02-28 Thread Tzu-Li (Gordon) Tai
with stock Flink 1.1.3 for the execution since I'm running things on top of Hopsworks. cheers Martin On Tue, Feb 28, 2017 at 5:42 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi! This could be a Elasticsearch server / client version conflict, or that the uber jar of your code wasn’t

Re: unclear exception when writing to elasticsearch

2017-03-01 Thread Tzu-Li (Gordon) Tai
lang.Thread.run(Thread.java:745) On Wed, Mar 1, 2017 at 7:58 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Martin, You can do that by adding a dependency to the Elasticsearch client of your desired version in your project. You can also check what Elasticsearch client version

Re: unclear exception when writing to elasticsearch

2017-03-01 Thread Tzu-Li (Gordon) Tai
, 2017 at 9:23:35 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Martin, Just letting you know I’m trying your setup right now, and will get back to you once I confirm the results. - Gordon On March 1, 2017 at 9:15:16 PM, Martin Neumann (mneum...@sics.se) wrote: I created the project

Re: Kafka SimpleStringConsumer NPE

2016-09-04 Thread Tzu-Li (Gordon) Tai
Hi David, Is it possible that your Kafka installation is an older version than 0.9? Or you may have used a different Kafka client major version in your job jar's dependency? This seems like an odd incompatible protocol with the Kafka broker to me, as the client in the Kafka consumer is reading

Re: setting the name of a subtask ?

2016-09-12 Thread Tzu-Li (Gordon) Tai
Hi! Yes, you can set custom operator names by calling `.name(…)` on DataStreams after a transformation. For example, `.addSource(…).map(...).name(…)`. This name will be used for visualization on the dashboard, and also for logging. Regards, Gordon On September 12, 2016 at 3:44:58 PM, Bart

Re: Flink Checkpoint runs slow for low load stream

2016-10-04 Thread Tzu-Li (Gordon) Tai
Hi, Helping out here: this is the PR for async Kafka offset committing - https://github.com/apache/flink/pull/2574. It has already been merged into the master and release-1.1 branches, so you can try out the changes now if you’d like. The change should also be included in the 1.1.3 release,

Re: Using Flink

2016-10-04 Thread Tzu-Li (Gordon) Tai
rators? If I need the state stored on previous operators, how can I fetch it? I don’t think this is possible. Fine, thanks. Thanks. On Mon, Oct 3, 2016 at 12:23 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi! - I'm currently running my flink program on 1.2 SNAPSHOT with kafka s

Re: Presented Flink use case in Japan

2016-10-04 Thread Tzu-Li (Gordon) Tai
Really great to hear this! Cheers, Gordon On October 4, 2016 at 8:13:27 PM, Till Rohrmann (trohrm...@apache.org) wrote: It's always great to hear Flink success stories :-) Thanks for sharing it with the community.  I hope Flink helps you to solve even more problems. And don't hesitate to

Re: Kinesis connector - Iterator expired exception

2016-08-26 Thread Tzu-Li (Gordon) Tai
Hi Josh, Thank you for reporting this, I’m looking into it. There was some major changes to the Kinesis connector after mid June, but the changes don’t seem to be related to the iterator timeout, so it may be a bug that had always been there. I’m not sure yet if it may be related, but may I

Re: Kinesis connector - Iterator expired exception

2016-08-26 Thread Tzu-Li (Gordon) Tai
57 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Josh, Thank you for reporting this, I’m looking into it. There was some major changes to the Kinesis connector after mid June, but the changes don’t seem to be related to the iterator timeout, so it may be a bug that had always

Re: Threading Model for Kinesis

2016-08-23 Thread Tzu-Li (Gordon) Tai
stamps receiving and sending data. If there is only one source instance per broker how does that happen? Thanks, Sameer On Tue, Aug 23, 2016 at 7:17 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com> wrote: > Hi! > > Kinesis shards should be ideally evenly assigned to the source insta

Re: Threading Model for Kinesis

2016-08-23 Thread Tzu-Li (Gordon) Tai
Hi! Kinesis shards should be ideally evenly assigned to the source instances. So, with your example of source parallelism of 10 and 20 shards, each source instance will have 2 shards and will have 2 threads consuming them (therefore, not in round robin). For the Kafka consumer, in the source

Re: Default timestamps for Event Time when no Watermark Assigner used?

2016-08-23 Thread Tzu-Li (Gordon) Tai
on the source stream- assignTimestampsAndWatermarks(new IngestionTimeExtractor<>()) Sameer On Tue, Aug 23, 2016 at 7:29 AM, Tzu-Li (Gordon) Tai <tzuli...@gmail.com> wrote: > Hi, > > For the Kinesis consumer, when you use Event Time but do not explicitly > assign timestamps,

Re: Threading Model for Kinesis

2016-08-23 Thread Tzu-Li (Gordon) Tai
:) Regards, Gordon On August 23, 2016 at 9:31:58 PM, Tzu-Li (Gordon) Tai (tzuli...@gmail.com) wrote: Slight misunderstanding here. The one thread per Kafka broker happens *after* the assignment of Kafka partitions to the source instances. So, with a total of 10 partitions and 10 source instances

Re: Default timestamps for Event Time when no Watermark Assigner used?

2016-08-23 Thread Tzu-Li (Gordon) Tai
Hi, For the Kinesis consumer, when you use Event Time but do not explicitly assign timestamps, the Kinesis server-side timestamp (the time which Kinesis received the record) is attached to the record as default, not Flink’s ingestion time. Does this answer your question? Regards, Gordon On

Re: Using Flink

2016-10-03 Thread Tzu-Li (Gordon) Tai
Hi! - I'm currently running my flink program on 1.2 SNAPSHOT with kafka source and I have checkpoint enabled. When I look at the consumer offsets in kafka it appears to be stagnant and there is a huge lag. But I can see my flink program is in pace with kafka source in JMX metrics and outputs.

Re: FlinkKafkaConsumer and Kafka topic/partition change

2016-09-27 Thread Tzu-Li (Gordon) Tai
Hi! This is definitely a planned feature for the Kafka connectors, there’s a JIRA exactly for this [1]. We’re currently going through some blocking tasks to make this happen, I also hope to speed up things over there :) Your observation is correct that the Kaka consumer uses “assign()” instead

Re: Why use Kafka after all?

2016-11-22 Thread Tzu-Li (Gordon) Tai
Hi Matt, Just to be clear, what I'm looking for is a way to serialize a POJO class for Kafka but also for Flink, I'm not sure the interface of both frameworks are compatible but it seems they aren't. For Kafka (producer) I need a Serializer and a Deserializer class, and for Flink (consumer) a 

Re: Subtask keeps on discovering new Kinesis shard when using Kinesalite

2016-11-16 Thread Tzu-Li (Gordon) Tai
Hi Phillip, Thanks for testing it. From your log and my own tests, I can confirm the problem is with Kinesalite not correctly mocking the official Kinesis behaviour for the `describeStream` API. There’s a PR for the fix here: https://github.com/apache/flink/pull/2822. With this change, shard

Re: Why use Kafka after all?

2016-11-15 Thread Tzu-Li (Gordon) Tai
Hi Matt, Here’s an example of writing a DeserializationSchema for your POJOs: [1]. As for simply writing messages from WebSocket to Kafka using a Flink job, while it is absolutely viable, I would not recommend it, mainly because you’d never know if you might need to temporarily shut down Flink

Re: flink-dist shading

2016-11-18 Thread Tzu-Li (Gordon) Tai
Hi Craig, I think the email wasn't sent to the ‘dev’ list, somehow. Have you tried this: mvn clean install -DskipTests # In Maven 3.3 the shading of flink-dist doesn't work properly in one run, so we need to run mvn for flink-dist again. cd flink-dist mvn clean install -DskipTests I agree that

Re: Subtask keeps on discovering new Kinesis shard when using Kinesalite

2016-11-15 Thread Tzu-Li (Gordon) Tai
Hi Philipp, When used against Kinesalite, can you tell if the connector is already reading data from the test shard before any of the shard discovery messages? If you have any spare time to test this, you can set a larger value for the `ConsumerConfigConstants.SHARD_DISCOVERY_INTERVAL_MILLIS`

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Tzu-Li (Gordon) Tai
the job from my last savepoint, it's consuming the stream fine again with no problems. Do you have any idea what might be causing this, or anything I should do to investigate further? Cheers, Josh On Wed, Oct 5, 2016 at 4:55 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi S

Re: A question regarding to the checkpoint mechanism

2016-10-16 Thread Tzu-Li (Gordon) Tai
processing? Thanks, Li On Oct 17, 2016, at 11:10 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi! No, the operator does not need to pause processing input records while the checkpointing of its state is in progress. The checkpointing of operator state is asynchronous. The op

Re: Serializers and Schemas

2016-12-07 Thread Tzu-Li (Gordon) Tai
Hi Matt, 1. There’s some in-progress work on wrapper util classes for Kafka de/serializers here [1] that allows Kafka de/serializers to be used with the Flink Kafka Consumers/Producers with minimal user overhead. The PR also has some proposed adds to the documentations for the wrappers. 2. I

Re: Partitioning operator state

2016-12-07 Thread Tzu-Li (Gordon) Tai
Hi Dominik, Do you mean how Flink redistributes an operator’s state when the parallelism of the operator is changed? If so, you can take a look at [1] and [2]. Cheers, Gordon [1] https://issues.apache.org/jira/browse/FLINK-3755 [2] 

Re: Flink Kafka producer with a topic per message

2016-12-07 Thread Tzu-Li (Gordon) Tai
Hi, The FlinkKafkaProducers currently support per record topic through the user-provided serialization schema which has a "getTargetTopic(T element)" method called for per record, and also decides the partition the record will be sent to through a custom KafkaPartitioner, which is also provided 

Re: Fault tolerance guarantees of Elasticsearch sink in flink-elasticsearch2?

2017-01-13 Thread Tzu-Li (Gordon) Tai
Hi Andrew, Your observations are correct. Like you mentioned, the current problem circles around how we deal with the pending buffered requests with accordance to Flink’s checkpointing. I’ve filed a JIRA for this, as well as some thoughts for the solution in the description: 

Re: Kafka topic partition skewness causes watermark not being emitted

2017-01-13 Thread Tzu-Li (Gordon) Tai
Hi, This is expected behaviour due to how the per-partition watermarks are designed in the Kafka consumer, but I think it’s probably a good idea to handle idle partitions also when the Kafka consumer itself emits watermarks. I’ve filed a JIRA issue for this: 

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Tzu-Li (Gordon) Tai
Henri Heiskanen <henri.heiska...@gmail.com> wrote: Hi, We had the same problem when running 0.9 consumer against 0.10 Kafka. Upgrading Flink Kafka connector to 0.10 fixed our issue. Br, Henkka On Mon, Jan 9, 2017 at 5:39 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi, N

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Tzu-Li (Gordon) Tai
Hi, Not sure what might be going on here. I’m pretty certain that for FlinkKafkaConsumer09 when checkpointing is turned off, the internally used KafkaConsumer client will auto commit offsets back to Kafka at a default interval of 5000ms (the default value for “auto.commit.interval.ms”). Could

Re: Joining two kafka streams

2017-01-08 Thread Tzu-Li (Gordon) Tai
Hi Igor! What you can actually do is let a single FlinkKafkaConsumer consume from both topics, producing a single DataStream which you can keyBy afterwards. All versions of the FlinkKafkaConsumer support consuming multiple Kafka topics simultaneously. This is logically the same as union and

Re: Kafka KeyedStream source

2017-01-09 Thread Tzu-Li (Gordon) Tai
Hi Niels, Thank you for bringing this up. I recall there was some previous discussion related to this before: [1]. I don’t think this is possible at the moment, mainly because of how the API is designed. On the other hand, a KeyedStream in Flink is basically just a DataStream with a hash

[ANNOUNCE] Apache Flink 1.1.5 Released

2017-03-23 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is pleased to announce the availability of Flink 1.1.5, which is the next bugfix release for the 1.1 series. The official release announcement: https://flink.apache.org/news/2017/03/23/release-1.1.5.html Release binaries: http://apache.lauf-forum.at/flink/flink-1.1.5

Re: Question Regarding a sink..

2017-03-23 Thread Tzu-Li (Gordon) Tai
Hi Steve, This normally shouldn’t happen, unless there simply is two copies of the data. What is the source of the topology? Also, this might be obvious, but if you have broadcasted your input stream to the sink, then each sink instance would then get all records in the input stream. Cheers,

Re: Flink 1.2 time window operation

2017-03-30 Thread Tzu-Li (Gordon) Tai
Hi, Thanks for the clarification. What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet? First, some remarks here -  sources (in your case the Kafka consumer) will not stop fetching / producing data when the windows haven’t fired yet. Does

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-30 Thread Tzu-Li (Gordon) Tai
I'm wondering what I can tweak further to increase this. I was reading in this blog: https://data-artisans.com/extending-the-yahoo-streaming-benchmark/ about 3 millions per sec with only 20 partitions. So i'm sure I should be able to squeeze out more out of it. Not really sure if it is relevant

Re: Kinesis connector SHARD_GETRECORDS_MAX default value

2017-03-22 Thread Tzu-Li (Gordon) Tai
Hi Steffan, I have to admit that I didn’t put too much thoughts in the default values for the Kinesis consumer. I’d say it would be reasonable to change the default values to follow KCL’s settings. Could you file a JIRA for this? In general, we might want to reconsider all the default values

Re: How to rewind Kafka cursors into a Flink job ?

2017-03-30 Thread Tzu-Li (Gordon) Tai
Hi Dominique, What your plan A is suggesting is that a downstream operator can provide signals to upstream operators and alter their behaviour. In general, this isn’t possible, as in a distributed streaming environment it’s hard to guarantee what records exactly will be altered by the

Re: Apache Flink Hackathon

2017-03-30 Thread Tzu-Li (Gordon) Tai
Sounds like a cool event! Thanks for sharing this! On March 27, 2017 at 11:40:24 PM, Lior Amar (lior.a...@parallelmachines.com) wrote: Hi all, My name is Lior and I am working at Parallel Machines (a startup company located in the Silicon Valley). We are hosting a Flink Hackathon on April

Re: Flink 1.2 time window operation

2017-03-30 Thread Tzu-Li (Gordon) Tai
Hi Dominik, Was the job running with processing time or event time? If event time, how are you producing the watermarks? Normally to understand how windows are firing in Flink, these two factors would be the place to look at. I can try to further explain this once you provide info with these.

Re: Flink 1.2 time window operation

2017-03-31 Thread Tzu-Li (Gordon) Tai
code and a detail Flink configuration.  Cheers, Dominik On 30 Mar 2017, at 13:09, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi, Thanks for the clarification. What are the reasons behind consuming/producing messages from/to Kafka while the window has not expired yet? First, some r

Re: 回复:Re: flink one transformation end,the next transformation start

2017-03-31 Thread Tzu-Li (Gordon) Tai
value must use all data.so is there some operate can set the step 3 start when step 2 is end. - 原始邮件 ----- 发件人:"Tzu-Li (Gordon) Tai" <tzuli...@apache.org> 收件人:rimin...@sina.cn 主题:Re: flink one transformation end,the next transformation start 日期:2017年03月30日 15点54分 Hi, What e

Re: shaded version of legacy kafka connectors

2017-03-21 Thread Tzu-Li (Gordon) Tai
Hi, There currently isn’t any shaded version Kafka 0.8 connector version available, so yes, you would need to do build that yourself. I’m not completely sure if there will be any class name clashes, because the Kafka 0.8 API is typically packaged under `kafka.javaapi.*`, while in 0.9 / 0.10

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tzu-Li (Gordon) Tai
Hi Tarandeep, I haven’t looked at the rest of the code yet, but my first guess is that you might not be reading any data from Kafka at all: private static DataStream readKafkaStream(String topic, StreamExecutionEnvironment env) throws IOException { Properties properties = new

Re: Telling if a job has caught up with Kafka

2017-03-18 Thread Tzu-Li (Gordon) Tai
way. Gyula Tzu-Li (Gordon) Tai <tzuli...@apache.org> ezt írta (időpont: 2017. márc. 17., P, 14:24): One other possibility for reporting “consumer lag” is to update the metric only at a configurable interval, if use cases can tolerate a certain delay in realizing the consumer has cau

Re: Telling if a job has caught up with Kafka

2017-03-18 Thread Tzu-Li (Gordon) Tai
@Florian the 0.9 / 0.10 version and 0.8 version behave a bit differently right now for the offset committing. In 0.9 / 0.10, if checkpointing is enabled, the “auto.commit.enable” etc. settings will be completely ignored and overwritten before used to instantiate the interval Kafka clients,

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Tzu-Li (Gordon) Tai
Hi, I was thinking somewhat similar to what Ufuk suggested, but if we want to report a “consumer lag” metric, we would essentially need to request the latest offset on every record fetch (because the latest offset advances as well), so I wasn’t so sure of the performance tradeoffs there (the

Re: Telling if a job has caught up with Kafka

2017-03-17 Thread Tzu-Li (Gordon) Tai
sed here. What do you think? On March 17, 2017 at 9:05:04 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi, I was thinking somewhat similar to what Ufuk suggested, but if we want to report a “consumer lag” metric, we would essentially need to request the latest offset on every reco

Re: Data+control stream from kafka + window function - not working

2017-03-16 Thread Tzu-Li (Gordon) Tai
time I run the code. I have put break points and print statements to verify that. Also, if I don't connect with control stream the window function works.  - Tarandeep On Mar 16, 2017, at 1:12 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: Hi Tarandeep, I haven’t looked at the rest

Re: Doubt Regarding producing to kafka using flink

2017-04-03 Thread Tzu-Li (Gordon) Tai
illis(); } }); FlinkKafkaProducer010.FlinkKafkaProducer010Configuration config = FlinkKafkaProducer010.writeToKafkaWithTimestamps(stream, KAFKA_TOPIC, new WordCountSchema(), properties); config.setWriteTimestampToKafka(true); env.execute("job"); On Mon, Apr 3,

Re: Problems with Kerberos Kafka connection in version 1.2.0

2017-04-11 Thread Tzu-Li (Gordon) Tai
Hi Diego, I think the problem is here: security.kerberos.login.contexts: Client, KafkaClient The space between “Client,” and “KafkaClient” is causing the problem. Removing it should fix your issue. Cheers, Gordon On April 11, 2017 at 3:24:20 AM, Diego Fustes Villadóniga (dfus...@oesia.com)

Re: Custom timer implementation using Flink

2017-04-11 Thread Tzu-Li (Gordon) Tai
Hi, I just need to  start a timer of x days/hours (lets say) and when it is fired just trigger  something. Flink’s lower-level ProcessFunction [1] should be very suitable to implement this. Have you taken a look at this and see if it suits your case? [1] 

Re: Flink Kafka Consumer Behaviour

2017-04-20 Thread Tzu-Li (Gordon) Tai
Hi Sandeep, It isn’t fixed yet, so I think external tools like the Kafka offset checker still won’t work. If you’re using 08 and is currently stuck with this issue, you can still directly query ZK to get the offsets. I think for FlinkKafkaConsumer09 the offset is exposed to Flink's metric

Re: Read from and Write to Kafka through flink

2017-04-19 Thread Tzu-Li (Gordon) Tai
Hi Pradeep, There is not single API or connector to take input as a file and writing it to Kafka. In Flink, this operation consists of 2 parts, 1) source reading from input, and 2) sink producing to Kafka. So, all you have to have a job that consists of that source and sink. You’ve already

Re: put record to kinesis and then trying consume using flink connector

2017-04-23 Thread Tzu-Li (Gordon) Tai
Hi Sathi, Here, in the producer-side log, it says: 2017-04-22 19:46:53,608 INFO  [main] DLKnesisPublisher: Successfully published record, of bytes :162810 partition key :fb043b73-1d9d-447a-a593-b9c1bf3d4974-1492915604228 ShardId: shardId-Sequence

Re: Kinesis connector SHARD_GETRECORDS_MAX default value

2017-04-23 Thread Tzu-Li (Gordon) Tai
to file the issue: https://issues.apache.org/jira/browse/FLINK-6365. Really appreciate your contributions for the Kinesis connector! Cheers, Steffen On 22/03/2017 20:21, Tzu-Li (Gordon) Tai wrote: > Hi Steffan, > > I have to admit that I didn’t put too much thoughts in th

Re: AWS exception serialization problem

2017-03-11 Thread Tzu-Li (Gordon) Tai
nd when it fails to parse it, we can see the ClassNotFoundException for the relevant exception (in our case JsResultException from the play-json library). The library is indeed in the shaded JAR, otherwise we would not be able to parse the JSON. Cheers, Bruno On Wed, 8 Mar 2017 at 12:57 Tzu-Li (Gordon) Tai <tz

Re: AWS exception serialization problem

2017-03-11 Thread Tzu-Li (Gordon) Tai
FYI: Here’s the JIRA ticket to track this issue -  https://issues.apache.org/jira/browse/FLINK-6025. On March 11, 2017 at 6:27:36 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Shannon, Thanks a lot for providing the example, it was very helpful in reproducing the problem. I think

Re: Flink, Yarn and MapR Kerberos issue

2017-03-13 Thread Tzu-Li (Gordon) Tai
Hi Aniket! Thanks for also looking into the problem! I think checking `getAuthenticationMethod` on the UGI subject is the way to go. At the moment I don’t think there’s a better “proper” solution for this. As explained in the JIRA, we simply should not be checking for Kerberos credentials for

Re: AWS exception serialization problem

2017-03-08 Thread Tzu-Li (Gordon) Tai
Hi Shannon, Just to clarify: From the error trace, it seems like that the messages fetched from Kafka are serialized `AmazonS3Exception`s, and you’re emitting a stream of `AmazonS3Exception` as records from FlinkKafkaConsumer? Is this correct? If so, I think we should just make sure that the

Re: Any good ideas for online/offline detection of devices that send events?

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi Bruno! The Flink CEP library also seems like an option you can look into to see if it can easily realize what you have in mind. Basically, the pattern you are detecting is a timeout of 5 minutes after the last event. Once that pattern is detected, you emit a “device offline” event

Re: Flink using notebooks in EMR

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi, Are you running Zeppelin on a local machine? I haven’t tried this before, but you could first try and check if port ‘6123’ is publicly accessible in the security group settings of the AWS EMR instances. - Gordon On March 3, 2017 at 10:21:41 AM, Meghashyam Sandeep V

Re: Memory Limits: MiniCluster vs. Local Mode

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi Dominik, AFAIK, the local mode executions create a mini cluster within the JVM to run the job. Also, `MiniCluster` seems to be something FLIP-6 related, and since FLIP-6 is still work in progress, I’m not entirely sure if it is viable at the moment. Right now, you should look into using

Re: Any good ideas for online/offline detection of devices that send events?

2017-03-06 Thread Tzu-Li (Gordon) Tai
On March 6, 2017 at 9:40:28 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Bruno! The Flink CEP library also seems like an option you can look into to see if it can easily realize what you have in mind. Basically, the pattern you are detecting is a timeout of 5 minutes after the last

Re: How to use 'dynamic' state

2017-03-06 Thread Tzu-Li (Gordon) Tai
Hi Steve, I’ll try to provide some input for the approaches you’re currently looking into (I’ll comment on your email below): * API based stop and restart of job … ugly.  Yes, indeed ;) I think this is absolutely not necessary. * Use a co-map function with the rules alone stream and the events

Re: AWS exception serialization problem

2017-03-07 Thread Tzu-Li (Gordon) Tai
Hi, I just had a quick look on this, but the Kafka fetcher thread’s context classloader doesn’t seem to be the issue (at least for 1.1.4). In Flink 1.1.4, a separate thread from the task thread is created to run the fetcher, but since the task thread sets the user code classloader as its

Re: Flink Error/Exception Handling

2017-03-03 Thread Tzu-Li (Gordon) Tai
Hi Sunil, There’s recently some effort in allowing `DeserializationSchema#deserialize()` to return `null` in cases like yours, so that the invalid record can be simply skipped instead of throwing an exception from the deserialization schema. Here are the related links that you may be interested

Re: A Link Sink that writes to OAuth2.0 protected resource

2017-03-03 Thread Tzu-Li (Gordon) Tai
Hi Hussein! Your approach seems reasonable to me. The open() method will be called only once for the UDF every time the job has started (and when the job is restored from failures also). Cheers, Gordon On March 3, 2017 at 7:03:22 PM, Hussein Baghdadi (hussein.baghd...@zalando.de) wrote:

Re: Elasticsearch 5.x connection

2017-03-02 Thread Tzu-Li (Gordon) Tai
Hi, java.lang.NoSuchMethodError: org.apache.flink.util.InstantiationUtil.isSerializable(Ljava/lang/Object;)Z         at org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.(ElasticsearchSinkBase.java:195)         at

  1   2   3   4   5   6   >