Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Tzu-Li (Gordon) Tai
This in general is not a good idea, as the state you query using queryable state within a job does not provide any consistency guarantees at all. Would it be possible to have some trigger that emits state of the windows, and join the states downstream? In general, that is a better approach for

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, That in general is not a good idea, with the problem you mentioned as well as the fact that the state you query within the same job using queryable state does not provide any means of consistency guarantee. When it comes to "querying state from another operator", it is a hint that your use

Re: Performance impact of many open windows at the same time

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi Joe, The main effect this should have is more state to be kept until the windows can be fired (and state purged). This would of course increase the time it takes to checkpoint the operator. I'm not sure if there will be significant runtime per-record impact caused by how windows are

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Are you getting an exception from running the Harness? The Harness should already have the required configurations, such as the parent first classloading config. Otherwise, if you would like to add your own configuration, use the `withConfiguration` method on the `Harness` class. On Fri, May 22,

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Sorry, forgot to cc user@ as well in the last reply. On Fri, May 22, 2020 at 12:01 PM Tzu-Li (Gordon) Tai wrote: > As an extra note, the utilities you will find in `statefun-e2e-tests`, > such as the `StatefulFunctionsAppsContainers` is not yet intended for users. > This however was p

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, On Fri, May 22, 2020 at 7:19 AM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Also, where do I put flint-conf.yaml in Idea to add additional required > config parameter: > > classloader.parent-first-patterns.additional: >

Re: How To subscribe a Kinesis Stream using enhance fanout?

2020-05-14 Thread Tzu-Li (Gordon) Tai
Hi Xiaolong, You are right, the way the Kinesis connector is implemented / the way the AWS APIs are used, does not allow it to consume Kinesis streams with enhanced fan-out enabled consumers [1]. Could you open a JIRA ticket for this? As far as I can tell, this could be a valuable contribution to

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-11 Thread Tzu-Li (Gordon) Tai
In that case, the most possible cause would be https://issues.apache.org/jira/browse/FLINK-16313, which is included in Flink 1.10.1 (to be released) The release candidates for Flink 1.10.1 is currently ongoing, would it be possible for you to try that out and see if the error still occurs? On

Re: Statefun 2.0 questions

2020-05-10 Thread Tzu-Li (Gordon) Tai
Hi, Correct me if I'm wrong, but from the discussion so far it seems like what Wouter is looking for is an HTTP-based ingress / egress. We have been thinking about this in the past. The specifics of the implementation is still to be discussed, but to be able to ensure exactly-once processing

Re: Rich Function Thread Safety

2020-05-10 Thread Tzu-Li (Gordon) Tai
As others have mentioned already, it is true that method calls on operators (e.g. processing events and snapshotting state) will not concurrently happen. As for your findings in reading through the documentation, that might be a hint that we could add a bit more explanation mentioning this. Could

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-08 Thread Tzu-Li (Gordon) Tai
Hi, The last time I saw this error, was that there was a mismatch in the used flink-state-processor-api version and other core Flink dependencies. Could you confirm that? Also, are you seeing this assertion error consistently, or only occasionally? cc'ing Seth, maybe he has other clues on the

Re: Benchmark for Stateful Functions

2020-05-04 Thread Tzu-Li (Gordon) Tai
Hi Omid, There currently aren't any benchmarks that I know of for Stateful Functions. However, Stateful Functions applications run on top of Apache Flink and therefore share the same network stack / runtime. So, if throughput and latency is your only concern, you should be able carry over any

Re: Question about EventTimeTrigger

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi, Could you briefly describe what you are trying to achieve? By definition, a GlobalWindow includes all data - the ending timestamp for these windows are therefore Long.MAX_VALUE. An event time trigger wouldn't make sense here, since that trigger would never fire (watermark can not pass the

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi, As you mentioned, Gelly Graph's are backed by Flink DataSets, and therefore work primarily on static graphs. I don't think it'll be possible to implement incremental algorithms described in your SO question. Have you tried looking at Stateful Functions, a recent new API added to Flink? It

Re: [Stateful Functions] Using Flink CEP

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi! It isn't possible to use Flink CEP within Stateful Functions. That could be an interesting primitive, to add CEP-based function constructs. Could your briefly describe what you are trying to achieve? On the other hand, there are plans to integrate Stateful Functions more closely with the

Re: StateFun - Multiple modules example

2020-04-08 Thread Tzu-Li (Gordon) Tai
Hi Oytun! You can see here an example of how to package a StateFun application image that contains multiple modules: https://ci.apache.org/projects/flink/flink-statefun-docs-stable/deployment-and-operations/packaging.html#images Essentially, for each module you want to include in your

[ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.0.0. Stateful Functions is an API that simplifies building distributed stateful applications. It's based on functions with persistent state that can interact dynamically with strong consistency

Re: state schema evolution for case classes

2020-04-02 Thread Tzu-Li (Gordon) Tai
assOf[TestDataNested]" > in my current application, basically replace the serialise it throws the > "new state serialiser is not compaitable. > > What can I do here, would be great help thanks in advance > > On Fri, Mar 27, 2020 at 1:19 PM Tzu-Li (Gordon) Tai > wrote

Re: Lack of KeyedBroadcastStateBootstrapFunction

2020-03-30 Thread Tzu-Li (Gordon) Tai
.com/jobs?utm_source=signature_medium=email> > > On Mon, Mar 30, 2020 at 1:04 AM Tzu-Li (Gordon) Tai > wrote: > >> It seems like Seth's reply didn't make it to the mailing lists somehow. >> Forwarding his reply below: >> >> -- Forward

Fwd: Lack of KeyedBroadcastStateBootstrapFunction

2020-03-30 Thread Tzu-Li (Gordon) Tai
It seems like Seth's reply didn't make it to the mailing lists somehow. Forwarding his reply below: -- Forwarded message - From: Seth Wiesman Date: Thu, Mar 26, 2020 at 5:16 AM Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction To: Dawid Wysakowicz Cc: , Tzu-Li (Gordon

Re: state schema evolution for case classes

2020-03-27 Thread Tzu-Li (Gordon) Tai
ding.state.CustomAvroSerializer$class.deserialize(CustomAvroSerializer.scala:42) > at > nl.mrooding.state.ProductSerializer.deserialize(ProductSerializer.scala:9) > at > nl.mrooding.state.ProductSerializer.deserialize(ProductSerializer.scala:9) > at > org.apache.flink.contrib.streaming.state

Re: state schema evolution for case classes

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi Apoorv, Flink currently does not natively support schema evolution for state types using Scala case classes [1]. So, as Roman has pointed out, there are 2 possible ways for you to do that: - Implementing a custom serializer that support schema evolution for your specific Scala case classes,

Re: Help me understand this Exception

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi, The exception stack you posted simply means that the next operator in the chain failed to process the output watermark. There should be another exception, which would explain why some operator was closed / failed and eventually leading to the above exception. That would provide more insight

Re: Apache Airflow - Question about checkpointing and re-run a job

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi, I believe that the title of this email thread was a typo, and should be "Apache Flink - Question about checkpointing and re-run a job." I assume this because the contents of the previous conversations seem to be purely about Flink. Otherwise, as far as I know, there doesn't seem to be any

Re: what is the difference between map vs process on a datastream?

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi, As David already explained, they are similar in that you may output zero to multiple records for both process and flatMap functions. However, ProcessFunctions also expose to the user much more powerful functionality, such as registering timers, outputting to side outputs, etc. Cheers,

Re: RichAsyncFunction + BroadcastProcessFunction

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi John, Have you considered letting the BroadcastProcessFunction output events that indicate extra external HTTP requests needs to be performed, and have them consumed by a downstream async IO operator to complete the HTTP request? That could work depending on what exactly you need to do in your

Re: Very large _metadata file

2020-03-04 Thread Tzu-Li (Gordon) Tai
Hi Jacob, Apart from what Klou already mentioned, one slightly possible reason: If you are using the FsStateBackend, it is also possible that your state is small enough to be considered to be stored inline within the metadata file. That is governed by the "state.backend.fs.memory-threshold"

Re: what is the hash function that Flink creates the UID?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, Flink currently performs a 128-bit murmur hash on the user-provided uids to generate the final node hashes in the stream graph. Specifically, this library is being used [1] as the hash function. If what you are looking for is for Flink to use exactly the provided hash, you can use

Re: java.time.LocalDateTime in POJO type

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, What that LOG means (i.e. "must be processed as a Generic Type") is that Flink will have to fallback to using Kryo for the serialization for that type. You should be concerned about that if: 1) That type is being used for some persisted state in snapshots. That would be the case if you've

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi Kaymak, To answer your last question: there will be no data loss in that scenario you described, but there could be duplicate processed records. With checkpointing enabled, the Flink Kafka consumer does not commit offsets back to Kafka until offsets in Flink checkpoints have been persisted.

Re: Flink on AWS - ActiveMQ connector

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, The connectors that are listed in the AWS documentation page that you referenced are not provided by AWS. They are bundled connectors shipped by the Apache Flink community as part of official Flink releases, and are discoverable as artifacts from the Maven central repository. See the

Re: How is state stored in rocksdb?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, First of all, state is only managed by Flink (and therefore Flink's state backends) if the state is registered by the user. You can take a look at the documents here [1] on details on how to register state. A state has to be registered for it to be persisted in checkpoints / savepoints, and

Re: Correct way to e2e test a Flink application?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi Laurent, You can take a look at Flink's MiniClusterResource JUnit test rule, and its usages in the codebase for that. The rule launches a Flink MiniCluster within the same JVM, and submission to the mini cluster resembles how it would be submitting to an actual Flink cluster, so you would

Re: Writing a POJO Schema Evolution E2E test in Java

2020-02-20 Thread Tzu-Li (Gordon) Tai
Hi Theo, This is indeed a tricky feature to test! On Thu, Feb 20, 2020 at 8:59 PM Theo Diefenthal < theo.diefent...@scoop-software.de> wrote: > Hi, > > We have a pipeline which internally uses Java POJOs and also needs to keep > some events entirely in state for some time. > > From time to

Re: State Processor API Keyed State

2020-02-18 Thread Tzu-Li (Gordon) Tai
is that type information extraction should be converged for the Java / Scala DataStream APIs. Cheers, Gordon On Wed, Feb 19, 2020 at 10:20 AM Tzu-Li (Gordon) Tai wrote: > Hi, > > Just to clarify - > I quickly went through the README of the project, and saw this: > "This

Re: State Processor API Keyed State

2020-02-18 Thread Tzu-Li (Gordon) Tai
Hi, Just to clarify - I quickly went through the README of the project, and saw this: "This error is seen after trying to read from a savepoint that was created using the same case class as a key." So, if I understood correctly, you were attempting to use the State Processor API to access a

Re: Issue with committing Kafka offsets

2020-01-31 Thread Tzu-Li (Gordon) Tai
Hi, There are no upper limits on the number of Kafka consumers per job. For each one of your FlinkKafkaConsumers, are you using the same group.id? That could maybe explain why you are experiencing higher commit times as you are adding more FlinkKafkaConsumers, as AFAIK on the broker side, the

Re: Old offsets consumed from Kafka after Flink upgrade to 1.9.1 (from 1.2.1)

2020-01-30 Thread Tzu-Li (Gordon) Tai
any prior snapshotted state (i.e. offsets) and respect the startup configuration. Let me know if this works for you! Cheers, Gordon On Thu, Jan 23, 2020 at 9:12 PM Tzu-Li (Gordon) Tai wrote: > Hi Somya, > > I'll have to take a closer look at the JIRA history to refresh my memory > on

Re: fliter and flatMap operation VS only a flatMap operation

2020-01-29 Thread Tzu-Li (Gordon) Tai
Hi, If your filter and flatMap operators are chained, then the performance difference should not be noticeable. If a shuffle (i.e. a keyBy operation) occurs after the filter and before the flatMap, then applying the filter first will be more efficient. Cheers, Gordon On Thu, Jan 30, 2020 at

Re: FsStateBackend vs RocksDBStateBackend

2020-01-29 Thread Tzu-Li (Gordon) Tai
Hi Ran, On Thu, Jan 30, 2020 at 9:39 AM Ran Zhang wrote: > Hi all, > > We have a Flink app that uses a KeyedProcessFunction, and in the function > it requires a ValueState(of TreeSet) and the processElement method needs to > access and update it. We tried to use RocksDB as our stateBackend but

Re: Old offsets consumed from Kafka after Flink upgrade to 1.9.1 (from 1.2.1)

2020-01-23 Thread Tzu-Li (Gordon) Tai
Hi Somya, I'll have to take a closer look at the JIRA history to refresh my memory on potential past changes that caused this. My first suspection is this: It is expected that the Kafka consumer will *ignore* the configured startup position if the job was restored from a savepoint. It will

Re: Questions of "State Processing API in Scala"

2020-01-20 Thread Tzu-Li (Gordon) Tai
Hi Izual, Thanks for reporting this! I'm also forwarding this to the user mailing list, as that is the more suitable place for this question. I think the usability of the State Processor API in Scala is indeed something that hasn’t been looked at closely yet. On Tue, Jan 21, 2020 at 8:12 AM

Re: State name uniqueness

2020-01-20 Thread Tzu-Li (Gordon) Tai
Hi Vasily, State names need to be unique within operators only. Cheers, Gordon On Mon, Jan 20, 2020 at 10:58 AM Vasily Melnik < vasily.mel...@glowbyteconsulting.com> wrote: > Hi all, > > I'm a bit confused with state name uniqueness. > Should it be unique within operator only, or within entire

Re: possible backwards compatibility issue between 1.8->1.9?

2019-11-18 Thread Tzu-Li (Gordon) Tai
Hi Bekir, Before diving deeper, just to rule out the obvious: Have you changed anything with the element type of the input stream to the async wait operator? This wasn't apparent from the information so far, so I want to quickly clear that out of the way first. Cheers, Gordon On Wed, Oct 30,

[ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink 1.9.0, which is the latest major release. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is

Fwd: [VOTE] Apache Flink 1.9.0, release candidate #3

2019-08-19 Thread Tzu-Li (Gordon) Tai
-- Forwarded message - From: Tzu-Li (Gordon) Tai Date: Tue, Aug 20, 2019 at 1:16 AM Subject: [VOTE] Apache Flink 1.9.0, release candidate #3 To: dev Hi all, Release candidate #3 for Apache Flink 1.9.0 is now ready for your review. Please review and vote on release candidate #3 for version

Fwd: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-09 Thread Tzu-Li (Gordon) Tai
-- Forwarded message - From: Tzu-Li (Gordon) Tai Date: Fri, Aug 9, 2019 at 6:17 PM Subject: [VOTE] Apache Flink Release 1.9.0, release candidate #2 To: dev Hi all, Release candidate #2 for Apache Flink 1.9.0 is now ready for your review. This is the first voting candidate for 1.9.0

Re: Restore state class not found exception in 1.8

2019-08-06 Thread Tzu-Li (Gordon) Tai
; operator state through CheckpointedFunction interface and this sink isn’t > used in all our jobs. > > Med venlig hilsen / Best regards > Lasse Nedergaard > > > Den 3. jun. 2019 kl. 12.50 skrev Tzu-Li (Gordon) Tai >: > > Hi Lasse, > > This is indeed a bit odd. I

Re: Debug Kryo.Serialization Exception

2019-07-04 Thread Tzu-Li (Gordon) Tai
I quickly checked the implementation of duplicate() for both the KryoSerializer and StreamElementSerializer (which are the only serializers involved here). They seem to be correct; especially for the KryoSerializer, since FLINK-8836 [1] we now always perform a deep copy of the KryoSerializer when

Re: Providing Custom Serializer for Generic Type

2019-07-04 Thread Tzu-Li (Gordon) Tai
Hi Andrea, Is there a specific reason you want to use a custom TypeInformation / TypeSerializer for your type? >From the description in the original post, this part wasn't clear to me. If the only reason is because it is generally suggested to avoid generic type serialization via Kryo, both for

Re: [ANNOUNCE] Apache Flink 1.8.1 released

2019-07-02 Thread Tzu-Li (Gordon) Tai
pache.org/news/2019/07/02/release-1.8.1.html >>> >>> The full release notes are available in Jira: >>> >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12345164 >>> >>> We would like to thank all contributors of the Apache Flink community >>> who made this release possible! >>> >>> Great thanks to @Tzu-Li (Gordon) Tai 's offline >>> kind help! >>> >>> Regards, >>> Jincheng >>> >> >

Re: Restore state class not found exception in 1.8

2019-06-03 Thread Tzu-Li (Gordon) Tai
Reportmessage extends ReportmessageBase and the state operator use > ReportmessageBase. > So we need to register all the class’s that extends a class used in state. > Don’t know why this is needed in 1.8 > > Med venlig hilsen / Best regards > Lasse Nedergaard > > > Den 28. maj 201

Re: What are savepoint state manipulation support plans

2019-05-29 Thread Tzu-Li (Gordon) Tai
FYI: Seth starting a FLIP for adding a savepoint connector that addresses this - http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-FLIP-43-Savepoint-Connector-td29233.html Please join the discussion there if you are interested! On Thu, Mar 28, 2019 at 5:23 PM Tzu-Li (Gordon

Re: Restore state class not found exception in 1.8

2019-05-28 Thread Tzu-Li (Gordon) Tai
Hi Lasse, Did you move the class to a different namespace / package or changed to be a nested class, across the Flink versions? That would be the only cause I could reason about at the moment. If possible, could you also have a very minimal snippet / instructions on how I can maybe reproduce

Re: Queryable State race condition or serialization errors?

2019-05-27 Thread Tzu-Li (Gordon) Tai
Hi Burgess, Would you be able to provide a minimal project that can reproduce your error? That would help a lot with figuring out the issue. If you prefer to share that only privately, please feel free to send me a private email with the details. Another thing you can do is set logging level to

Re: Connectors (specifically Kinesis Connector)

2019-05-23 Thread Tzu-Li (Gordon) Tai
Hi Steven, I assume you are referring to the problem that we don't publish the Kinesis connector artifacts to Maven, due to the licensing issue with KCL? I didn't manage to find any JIRAs that were addressing this issue directly, but the most related one would be this:

Re: State migration into multiple operators

2019-05-14 Thread Tzu-Li (Gordon) Tai
Hi, Just to add to what Piotr already mentioned: The community is working on adding support for this directly in Flink. You can follow the efforts here: https://issues.apache.org/jira/browse/FLINK-12047. Cheers, Gordon On Tue, May 14, 2019 at 11:39 AM Piotr Nowojski wrote: > Hi, > >

Re: problem with avro serialization

2019-05-14 Thread Tzu-Li (Gordon) Tai
Hi, Aljoscha opened a JIRA just recently for this issue: https://issues.apache.org/jira/browse/FLINK-12501. Do you know if this is a regression from previous Flink versions? I'm asking just to double check, since from my understanding of the issue, the problem should have already existed before.

Re: Avro state migration using Scala in Flink 1.7.2 (and 1.8)

2019-05-14 Thread Tzu-Li (Gordon) Tai
Hi Marc! I know we talked offline about the issues mentioned in this topic already, but I'm just relaying the result of the discussions here to make it searchable by others bumping into the same issues. On Thu, Mar 21, 2019 at 4:27 PM Marc Rooding wrote: > Hi > > I’ve been trying to get state

Re: State Migration with RocksDB MapState

2019-04-25 Thread Tzu-Li (Gordon) Tai
hema evolution still be triggered? Or does it actually go down to the > avro schema rather than just the class serialVersionUID? > > > > > > > On Mon, Mar 18, 2019 at 1:10 AM Tzu-Li (Gordon) Tai > wrote: > >> Hi Cliff, >> >> Thanks for bringing th

Re: What are savepoint state manipulation support plans

2019-03-28 Thread Tzu-Li (Gordon) Tai
@Ufuk Yes, creating a JIRA now already to track this makes sense. I've proceeded to open one: https://issues.apache.org/jira/browse/FLINK-12047 Let's move any further discussions there. Cheers, Gordon On Thu, Mar 28, 2019 at 5:01 PM Ufuk Celebi wrote: > I think such a tool would be really

Re: RocksDBStatebackend does not write checkpoints to backup path

2019-03-28 Thread Tzu-Li (Gordon) Tai
Hi, Do you have the full error message of the failure? A wild guess to begin with: have you made sure that there are sufficient permissions to create the directory? Best, Gordon On Tue, Mar 26, 2019 at 5:46 PM Paul Lam wrote: > Hi, > > I have a job (with Flink 1.6.4) which uses rocksdb

Re: What are savepoint state manipulation support plans

2019-03-28 Thread Tzu-Li (Gordon) Tai
Hi! Regarding the support for savepoint reading / writing / processing directly in core Flink, we've been thinking about that lately and might push a bit for adding the functionality to Flink in the next release. For example, beside Bravo, Seth (CC'ed) also had implemented something [1] for this.

Re: State Migration with RocksDB MapState

2019-03-17 Thread Tzu-Li (Gordon) Tai
Hi Cliff, Thanks for bringing this up! AFAIK, this wouldn't be a long-term blocker. I've just opened a JIRA to track this [1]. As explained in the JIRA ticket, the main reason this is disallowed in the initial support for state schema evolution was due to how migration was implemented in the

Re: Flink 1.7.1 uses Kryo version 2.24.0

2019-03-15 Thread Tzu-Li (Gordon) Tai
Hi, Currently Flink uses Kryo as the default serializer for data types that Flink's type serialization stack doesn't support [1]. This also includes serializers being used for managed state registered by users. Because of this, at the moment it's not easy to upgrade the Kryo version, since it is

Re: Problems with restoring from savepoint

2019-03-06 Thread Tzu-Li (Gordon) Tai
Hi Pavel, As you already discovered, this problem occurs still because in 1.7.x, the KryoSerializer is still using the deprecated TypeSerializerConfigSnapshot as its snapshot, which relies on the serializer being Java-serialized into savepoints as state metadata. In 1.8.0, all Flink's built-in

Re: ProducerFencedException when running 2 jobs with FlinkKafkaProducer

2019-02-18 Thread Tzu-Li (Gordon) Tai
ngOutput.collect(OperatorChain.java:554) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) > > &

[ANNOUNCE] Apache Flink 1.7.2 released

2019-02-17 Thread Tzu-Li (Gordon) Tai
Hi, The Apache Flink community is very happy to announce the release of Apache Flink 1.7.2, which is the second bugfix release for the Apache Flink 1.7 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-13 Thread Tzu-Li (Gordon) Tai
Hi, @Averell I renamed the `ElasticsearchFailureHandlerIndexer` to be `BufferingNoOpRequestIndexer`, which explains why you can't find it. The voting thread for RC#1 of 1.7.2 can be found at [1]. The actual commits which fixes the problem are d9c45af to 2f52227. Cheers, Gordon [1]

Re: 大家怎么学习flink的呢

2019-02-13 Thread Tzu-Li (Gordon) Tai
Hi, 除了 Apache Flink 官方文件以外 [1],我個人也建議可以看看 Ververica 這一系列的 Flink training 題材: https://training.ververica.com/ 除此之外,學習過程中有遇到任何問題也歡迎可以直接發信件跟我們詢問。 - Gordon [1] https://flink.apache.org/ On Thu, Feb 14, 2019 at 11:44 AM shen lei wrote: > 有木有好的经验或者方法分享一下,感谢。最近学的,感觉还是不系统。

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-13 Thread Tzu-Li (Gordon) Tai
Thanks for testing it out. Will be great to get your feedback on whether the release candidate for 1.7.2 fixes this for you. On Wed, Feb 13, 2019 at 7:38 PM Averell wrote: > Thank you Gordon. > > That's my exact problem. Will try the fix in 1.7.2 now. > > Thanks and regards, > Averell > > > >

Re: error while querying state

2019-02-12 Thread Tzu-Li (Gordon) Tai
Hi, Which Flink version are you using? Cheers, Gordon -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: ProducerFencedException when running 2 jobs with FlinkKafkaProducer

2019-02-12 Thread Tzu-Li (Gordon) Tai
Hi, I think this is unexpected. The generated transactional ids should not be clashing. Looking at the FlinkKafkaProducer code, it seems like the generation is only a function of the subtask id of the FlinkKafkaProducer, which could be the same across 2 different Kafka sources. I'm not

Re: Flink 1.6 Yarn Session behavior

2019-02-12 Thread Tzu-Li (Gordon) Tai
Hi, I'm forwarding this question to Gary (CC'ed), who most likely would have an answer for your question here. Cheers, Gordon On Wed, Feb 13, 2019 at 8:33 AM Jins George wrote: > Hello community, > > I am trying to upgrade a Flink Yarn session cluster running BEAM > pipelines from version

Re: How to register TypeInfoFactory for 'external' class

2019-02-12 Thread Tzu-Li (Gordon) Tai
Hi Alexey, I don't think using the @TypeInfo annotation is doable at the moment. Is this class being used only for input / output types of functions / operators? Or are you using it as a state type? For the former, I think you can explicitly set the TypeInformation by calling setTypeInfo on

Re: In-Memory state serialization with kryo fails

2019-02-12 Thread Tzu-Li (Gordon) Tai
Hi, I would suggest to avoid Kryo for state serialization, especially if this job is meant for production usage. It might get in the way in the future when you might decide to upgrade your value state schema. To do that, when declaring the descriptor for your MapState, provide a specific

Re: Help with a stream processing use case

2019-02-10 Thread Tzu-Li (Gordon) Tai
Hi, If Firehouse already supports sinking records from a Kinesis stream to an S3 bucket, then yes, Chesnay's suggestion would work. You route each record to a specific Kinesis stream depending on some value in the record using the KinesisSerializationSchema, and sink each Kinesis stream to their

Re: ElasticSearchSink - retrying doesn't work in ActionRequestFailureHandler

2019-02-09 Thread Tzu-Li (Gordon) Tai
Hi Averell, This seems to be the bug that you encountered: https://issues.apache.org/jira/browse/FLINK-11046. Cheers, Gordon On Sat, Feb 9, 2019 at 3:27 PM Averell wrote: > Hello, > > I am trying to follow this Flink guide [1] to handle errors in > ElasticSearchSink by re-adding the failed

Re: Checkpoints failed with NullPointerException

2019-01-29 Thread Tzu-Li (Gordon) Tai
Hi! Thanks for reporting this. This looks like a bug that we fixed in Flink 1.7.1 [1]. Would you be able to try with 1.7.1 and see if the issue is still happening for you? Cheers, Gordon [1] https://issues.apache.org/jira/browse/FLINK-11094 On Tue, Jan 29, 2019, 6:29 PM Averell I tried to

Re: getting duplicate messages from duplicate jobs

2019-01-28 Thread Tzu-Li (Gordon) Tai
Hi, Yes, Dawid is correct. The "group.id" setting in Flink's Kafka Consumer is only used for group offset fetching and committing offsets back to Kafka (only for exposure purposes, not used for processing guarantees). The Flink Kafka Consumer uses static partition assignment on the KafkaConsumer

Re: Sampling rate higher than 1Khz

2019-01-28 Thread Tzu-Li (Gordon) Tai
Hi! Yes, Flink's watermark timestamps are in milliseconds, which means that time-based operators such as time window operators will be fired at a per-millisecond granularity. Whether or not this introduces "latency" in the pipeline depends on the granularity of your time window operations; if you

Re: How Flink prioritise read from kafka topics and partitions ?

2019-01-28 Thread Tzu-Li (Gordon) Tai
Hi Sohi! On Wed, Jan 23, 2019 at 9:01 PM sohimankotia wrote: > Hi, > > Let's say I have flink Kafka consumer read from 3 topics , [ T-1 ,T-2,T-3 > ] > . > > - T1 and T2 are having partitions equal to 100 > - T3 is having partitions equal to 60 > - Flink Task (parallelism is 50) > There isn't

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-28 Thread Tzu-Li (Gordon) Tai
Thanks Peter! Yes, it would also be great if you try the patch in https://github.com/apache/flink/pull/7580 out and see if that works for you. On Mon, Jan 28, 2019 at 7:47 PM pwestermann wrote: > Hi Gordon, > > We should be able to wait for 1.7.2 but I will also test the workaround and > post

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-28 Thread Tzu-Li (Gordon) Tai
Hi, Thanks for all the information and reporting this. We've identified this to be an actual issue: https://issues.apache.org/jira/browse/FLINK-11436. There's also a PR opened to fix this, and is currently under review: https://github.com/apache/flink/pull/7580. I'll make sure that this is fixed

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-24 Thread Tzu-Li (Gordon) Tai
Hi! We've double checked the code, and the only plausible cause of this is that you may be using flink-avro 1.6.x with Flink 1.7.x. Could you double check that all Flink dependencies, including flink-avro, are 1.7.1? You can verify this by doing `mvn dependency:tree` on your job, and check that

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-23 Thread Tzu-Li (Gordon) Tai
Thanks for the logs. Is the job restore actually failing? If yes, there should be an exception for the exact cause of the failure. Otherwise, the AvroSerializer warnings in the taskmanager logs is actually expected behaviour when restoring from savepoint versions before 1.7.x, and shouldn't

Re: Trouble migrating state from 1.6.3 to 1.7.1

2019-01-23 Thread Tzu-Li (Gordon) Tai
Hi, Thanks for reporting this. Could you provide more details (error message, exception stack trace) that you are getting? This is unexpected, as the changes to flink-avro serializers in 1.7.x should be backwards compatible. More details on how the restore failed will be helpful here. Cheers,

Re: Custom Serializer for Avro GenericRecord

2019-01-10 Thread Tzu-Li (Gordon) Tai
Hi, Have you looked at [1]? You can annotate your type and provide a type info factory. The factory would be used to create the TypeInformation for that type, and in turn create the serializer used for that type. [1]

[ANNOUNCE] Apache Flink 1.6.3 released

2018-12-23 Thread Tzu-Li (Gordon) Tai
Hi, The Apache Flink community is very happy to announce the release of  Apache Flink 1.6.3, which is the third bugfix release for the Apache  Flink 1.6 series.  Apache Flink® is an open-source stream processing framework for  distributed, high-performing, always-available, and accurate data 

Re: Clarification around versioning and flink's state serialization

2018-12-21 Thread Tzu-Li (Gordon) Tai
> that falls back to Kryo will cause an error. > > Would love to see more updated slides if you don't mind. > > Thanks for taking the time, > Padarn > > > On Fri, Dec 21, 2018 at 10:04 PM Tzu-Li (Gordon) Tai > wrote: > >> For the documents I would recommend re

Re: Clarification around versioning and flink's state serialization

2018-12-21 Thread Tzu-Li (Gordon) Tai
For the documents I would recommend reading through: https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/schema_evolution.html On Fri, Dec 21, 2018, 9:55 PM Tzu-Li (Gordon) Tai Hi, > > Yes, if Flink does not recognize your registered state type, it will by > de

Re: Clarification around versioning and flink's state serialization

2018-12-21 Thread Tzu-Li (Gordon) Tai
Hi, Yes, if Flink does not recognize your registered state type, it will by default use Kryo for the serialization. And generally speaking, Kryo does not have good support for evolvable schemas compared to other serialization frameworks such as Avro or Protobuf. The reason why Flink defaults to

Re: Connection leak with flink elastic Sink

2018-12-14 Thread Tzu-Li (Gordon) Tai
the connections to elastic cluster reached to: netstat -aln | grep 9200 | wc -l 2333 On Thu, Dec 13, 2018 at 4:12 PM Tzu-Li (Gordon) Tai wrote: Hi, Besides the information that Chesnay requested, could you also provide a stack trace of the exception that caused the job to terminate in the first place

Re: Encountered the following Expired Iterator exception in getRecords using FlinkKinesisConsumer

2018-12-14 Thread Tzu-Li (Gordon) Tai
issues with LATEST option. TIA, Vijay On Thu, Dec 13, 2018 at 2:59 AM Tzu-Li (Gordon) Tai wrote: Hi! Thanks for reporting this. This looks like an overlooked corner case that the Kinesis connector doesn’t handle properly. First, let me clarify the case and how it can be reproduced. Please let

Re: Encountered the following Expired Iterator exception in getRecords using FlinkKinesisConsumer

2018-12-13 Thread Tzu-Li (Gordon) Tai
Hi! Thanks for reporting this. This looks like an overlooked corner case that the Kinesis connector doesn’t handle properly. First, let me clarify the case and how it can be reproduced. Please let me know if the following is correct: 1. You started a Kinesis connector source, with

Re: Connection leak with flink elastic Sink

2018-12-13 Thread Tzu-Li (Gordon) Tai
Hi, Besides the information that Chesnay requested, could you also provide a stack trace of the exception that caused the job to terminate in the first place? The Elasticsearch sink does indeed close the internally used Elasticsearch client, which should in turn properly release all resources

Re: Using FlinkKinesisConsumer through a proxy

2018-12-01 Thread Tzu-Li (Gordon) Tai
roxy-chain...com > kinesisConsumerConfig.setProperty(AWSUtil.AWS_CLIENT_CONFIG_PREFIX + > "proxyHost", proxyHost);//<== mo http:// in proxyHost name > > TIA, > Vijay > > > On Wed, Nov 14, 2018 at 12:50 AM Tzu-Li (Gordon) Tai > wrote: > >> Hi Vijay, >> >> I’m prett

Re: Last batch of stream data could not be sinked when data comes very slow

2018-11-14 Thread Tzu-Li (Gordon) Tai
Hi Henry, Flushing of buffered data in sinks should occur on two occasions - 1) when some buffer size limit is reached or a fixed-flush interval is fired, and 2) on checkpoints. Flushing any pending data before completing a checkpoint ensures the sink has at-least-once guarantees, so that

Re: Best practice to write data from a stream to non-relational, distributed database (hbase)

2018-11-14 Thread Tzu-Li (Gordon) Tai
Hi, Have you taken a look yet at this [1]? That is an example of writing a stream to HBase. Cheers, Gordon [1]  https://github.com/apache/flink/blob/master/flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java On 11 November 2018 at

Re: Flink auth against Zookeeper with MD5-Digest

2018-11-14 Thread Tzu-Li (Gordon) Tai
Hi, AFAIK, I don’t think there has been other discussions on this other than the original document on secured data access for Flink [1]. Unfortunately, I’m also not knowledgeable enough to comment on how feasible it would be to support MD5-Digest for authentication. Maybe Eron (cc’ed) can

<    1   2   3   4   5   6   >