Re: Restore state class not found exception in 1.8

2019-08-06 Thread Tzu-Li (Gordon) Tai
; operator state through CheckpointedFunction interface and this sink isn’t > used in all our jobs. > > Med venlig hilsen / Best regards > Lasse Nedergaard > > > Den 3. jun. 2019 kl. 12.50 skrev Tzu-Li (Gordon) Tai >: > > Hi Lasse, > > This is indeed a bit odd. I

Re: possible backwards compatibility issue between 1.8->1.9?

2019-11-18 Thread Tzu-Li (Gordon) Tai
Hi Bekir, Before diving deeper, just to rule out the obvious: Have you changed anything with the element type of the input stream to the async wait operator? This wasn't apparent from the information so far, so I want to quickly clear that out of the way first. Cheers, Gordon On Wed, Oct 30,

Re: java.time.LocalDateTime in POJO type

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, What that LOG means (i.e. "must be processed as a Generic Type") is that Flink will have to fallback to using Kryo for the serialization for that type. You should be concerned about that if: 1) That type is being used for some persisted state in snapshots. That would be the case if you've

Re: Has the "Stop" Button next to the "Cancel" Button been removed in Flink's 1.9 web interface?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi Kaymak, To answer your last question: there will be no data loss in that scenario you described, but there could be duplicate processed records. With checkpointing enabled, the Flink Kafka consumer does not commit offsets back to Kafka until offsets in Flink checkpoints have been persisted.

Re: Correct way to e2e test a Flink application?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi Laurent, You can take a look at Flink's MiniClusterResource JUnit test rule, and its usages in the codebase for that. The rule launches a Flink MiniCluster within the same JVM, and submission to the mini cluster resembles how it would be submitting to an actual Flink cluster, so you would

Re: How is state stored in rocksdb?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, First of all, state is only managed by Flink (and therefore Flink's state backends) if the state is registered by the user. You can take a look at the documents here [1] on details on how to register state. A state has to be registered for it to be persisted in checkpoints / savepoints, and

Re: Flink on AWS - ActiveMQ connector

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, The connectors that are listed in the AWS documentation page that you referenced are not provided by AWS. They are bundled connectors shipped by the Apache Flink community as part of official Flink releases, and are discoverable as artifacts from the Maven central repository. See the

Re: what is the hash function that Flink creates the UID?

2020-03-02 Thread Tzu-Li (Gordon) Tai
Hi, Flink currently performs a 128-bit murmur hash on the user-provided uids to generate the final node hashes in the stream graph. Specifically, this library is being used [1] as the hash function. If what you are looking for is for Flink to use exactly the provided hash, you can use

Re: Very large _metadata file

2020-03-04 Thread Tzu-Li (Gordon) Tai
Hi Jacob, Apart from what Klou already mentioned, one slightly possible reason: If you are using the FsStateBackend, it is also possible that your state is small enough to be considered to be stored inline within the metadata file. That is governed by the "state.backend.fs.memory-threshold"

Re: Old offsets consumed from Kafka after Flink upgrade to 1.9.1 (from 1.2.1)

2020-01-23 Thread Tzu-Li (Gordon) Tai
Hi Somya, I'll have to take a closer look at the JIRA history to refresh my memory on potential past changes that caused this. My first suspection is this: It is expected that the Kafka consumer will *ignore* the configured startup position if the job was restored from a savepoint. It will

Re: FsStateBackend vs RocksDBStateBackend

2020-01-29 Thread Tzu-Li (Gordon) Tai
Hi Ran, On Thu, Jan 30, 2020 at 9:39 AM Ran Zhang wrote: > Hi all, > > We have a Flink app that uses a KeyedProcessFunction, and in the function > it requires a ValueState(of TreeSet) and the processElement method needs to > access and update it. We tried to use RocksDB as our stateBackend but

Re: fliter and flatMap operation VS only a flatMap operation

2020-01-29 Thread Tzu-Li (Gordon) Tai
Hi, If your filter and flatMap operators are chained, then the performance difference should not be noticeable. If a shuffle (i.e. a keyBy operation) occurs after the filter and before the flatMap, then applying the filter first will be more efficient. Cheers, Gordon On Thu, Jan 30, 2020 at

Re: Old offsets consumed from Kafka after Flink upgrade to 1.9.1 (from 1.2.1)

2020-01-30 Thread Tzu-Li (Gordon) Tai
any prior snapshotted state (i.e. offsets) and respect the startup configuration. Let me know if this works for you! Cheers, Gordon On Thu, Jan 23, 2020 at 9:12 PM Tzu-Li (Gordon) Tai wrote: > Hi Somya, > > I'll have to take a closer look at the JIRA history to refresh my memory > on

Re: Issue with committing Kafka offsets

2020-01-31 Thread Tzu-Li (Gordon) Tai
Hi, There are no upper limits on the number of Kafka consumers per job. For each one of your FlinkKafkaConsumers, are you using the same group.id? That could maybe explain why you are experiencing higher commit times as you are adding more FlinkKafkaConsumers, as AFAIK on the broker side, the

Re: State Processor API Keyed State

2020-02-18 Thread Tzu-Li (Gordon) Tai
Hi, Just to clarify - I quickly went through the README of the project, and saw this: "This error is seen after trying to read from a savepoint that was created using the same case class as a key." So, if I understood correctly, you were attempting to use the State Processor API to access a

Re: State Processor API Keyed State

2020-02-18 Thread Tzu-Li (Gordon) Tai
is that type information extraction should be converged for the Java / Scala DataStream APIs. Cheers, Gordon On Wed, Feb 19, 2020 at 10:20 AM Tzu-Li (Gordon) Tai wrote: > Hi, > > Just to clarify - > I quickly went through the README of the project, and saw this: > "This

Re: Writing a POJO Schema Evolution E2E test in Java

2020-02-20 Thread Tzu-Li (Gordon) Tai
Hi Theo, This is indeed a tricky feature to test! On Thu, Feb 20, 2020 at 8:59 PM Theo Diefenthal < theo.diefent...@scoop-software.de> wrote: > Hi, > > We have a pipeline which internally uses Java POJOs and also needs to keep > some events entirely in state for some time. > > From time to

Re: State name uniqueness

2020-01-20 Thread Tzu-Li (Gordon) Tai
Hi Vasily, State names need to be unique within operators only. Cheers, Gordon On Mon, Jan 20, 2020 at 10:58 AM Vasily Melnik < vasily.mel...@glowbyteconsulting.com> wrote: > Hi all, > > I'm a bit confused with state name uniqueness. > Should it be unique within operator only, or within entire

Re: Questions of "State Processing API in Scala"

2020-01-20 Thread Tzu-Li (Gordon) Tai
Hi Izual, Thanks for reporting this! I'm also forwarding this to the user mailing list, as that is the more suitable place for this question. I think the usability of the State Processor API in Scala is indeed something that hasn’t been looked at closely yet. On Tue, Jan 21, 2020 at 8:12 AM

Re: StateFun - Multiple modules example

2020-04-08 Thread Tzu-Li (Gordon) Tai
Hi Oytun! You can see here an example of how to package a StateFun application image that contains multiple modules: https://ci.apache.org/projects/flink/flink-statefun-docs-stable/deployment-and-operations/packaging.html#images Essentially, for each module you want to include in your

[ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.0.0. Stateful Functions is an API that simplifies building distributed stateful applications. It's based on functions with persistent state that can interact dynamically with strong consistency

Re: Question about EventTimeTrigger

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi, Could you briefly describe what you are trying to achieve? By definition, a GlobalWindow includes all data - the ending timestamp for these windows are therefore Long.MAX_VALUE. An event time trigger wouldn't make sense here, since that trigger would never fire (watermark can not pass the

Re: [Stateful Functions] Using Flink CEP

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi! It isn't possible to use Flink CEP within Stateful Functions. That could be an interesting primitive, to add CEP-based function constructs. Could your briefly describe what you are trying to achieve? On the other hand, there are plans to integrate Stateful Functions more closely with the

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi, As you mentioned, Gelly Graph's are backed by Flink DataSets, and therefore work primarily on static graphs. I don't think it'll be possible to implement incremental algorithms described in your SO question. Have you tried looking at Stateful Functions, a recent new API added to Flink? It

Re: state schema evolution for case classes

2020-03-27 Thread Tzu-Li (Gordon) Tai
ding.state.CustomAvroSerializer$class.deserialize(CustomAvroSerializer.scala:42) > at > nl.mrooding.state.ProductSerializer.deserialize(ProductSerializer.scala:9) > at > nl.mrooding.state.ProductSerializer.deserialize(ProductSerializer.scala:9) > at > org.apache.flink.contrib.streaming.state

Re: state schema evolution for case classes

2020-04-02 Thread Tzu-Li (Gordon) Tai
assOf[TestDataNested]" > in my current application, basically replace the serialise it throws the > "new state serialiser is not compaitable. > > What can I do here, would be great help thanks in advance > > On Fri, Mar 27, 2020 at 1:19 PM Tzu-Li (Gordon) Tai > wrote

Fwd: Lack of KeyedBroadcastStateBootstrapFunction

2020-03-30 Thread Tzu-Li (Gordon) Tai
It seems like Seth's reply didn't make it to the mailing lists somehow. Forwarding his reply below: -- Forwarded message - From: Seth Wiesman Date: Thu, Mar 26, 2020 at 5:16 AM Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction To: Dawid Wysakowicz Cc: , Tzu-Li (Gordon

Re: Lack of KeyedBroadcastStateBootstrapFunction

2020-03-30 Thread Tzu-Li (Gordon) Tai
.com/jobs?utm_source=signature_medium=email> > > On Mon, Mar 30, 2020 at 1:04 AM Tzu-Li (Gordon) Tai > wrote: > >> It seems like Seth's reply didn't make it to the mailing lists somehow. >> Forwarding his reply below: >> >> -- Forward

Re: Benchmark for Stateful Functions

2020-05-04 Thread Tzu-Li (Gordon) Tai
Hi Omid, There currently aren't any benchmarks that I know of for Stateful Functions. However, Stateful Functions applications run on top of Apache Flink and therefore share the same network stack / runtime. So, if throughput and latency is your only concern, you should be able carry over any

Re: Rich Function Thread Safety

2020-05-10 Thread Tzu-Li (Gordon) Tai
As others have mentioned already, it is true that method calls on operators (e.g. processing events and snapshotting state) will not concurrently happen. As for your findings in reading through the documentation, that might be a hint that we could add a bit more explanation mentioning this. Could

Re: Statefun 2.0 questions

2020-05-10 Thread Tzu-Li (Gordon) Tai
Hi, Correct me if I'm wrong, but from the discussion so far it seems like what Wouter is looking for is an HTTP-based ingress / egress. We have been thinking about this in the past. The specifics of the implementation is still to be discussed, but to be able to ensure exactly-once processing

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-11 Thread Tzu-Li (Gordon) Tai
In that case, the most possible cause would be https://issues.apache.org/jira/browse/FLINK-16313, which is included in Flink 1.10.1 (to be released) The release candidates for Flink 1.10.1 is currently ongoing, would it be possible for you to try that out and see if the error still occurs? On

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-08 Thread Tzu-Li (Gordon) Tai
Hi, The last time I saw this error, was that there was a mismatch in the used flink-state-processor-api version and other core Flink dependencies. Could you confirm that? Also, are you seeing this assertion error consistently, or only occasionally? cc'ing Seth, maybe he has other clues on the

Re: How To subscribe a Kinesis Stream using enhance fanout?

2020-05-14 Thread Tzu-Li (Gordon) Tai
Hi Xiaolong, You are right, the way the Kinesis connector is implemented / the way the AWS APIs are used, does not allow it to consume Kinesis streams with enhanced fan-out enabled consumers [1]. Could you open a JIRA ticket for this? As far as I can tell, this could be a valuable contribution to

Re: RichAsyncFunction + BroadcastProcessFunction

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi John, Have you considered letting the BroadcastProcessFunction output events that indicate extra external HTTP requests needs to be performed, and have them consumed by a downstream async IO operator to complete the HTTP request? That could work depending on what exactly you need to do in your

Re: Apache Airflow - Question about checkpointing and re-run a job

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi, I believe that the title of this email thread was a typo, and should be "Apache Flink - Question about checkpointing and re-run a job." I assume this because the contents of the previous conversations seem to be purely about Flink. Otherwise, as far as I know, there doesn't seem to be any

Re: Help me understand this Exception

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi, The exception stack you posted simply means that the next operator in the chain failed to process the output watermark. There should be another exception, which would explain why some operator was closed / failed and eventually leading to the above exception. That would provide more insight

Re: what is the difference between map vs process on a datastream?

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi, As David already explained, they are similar in that you may output zero to multiple records for both process and flatMap functions. However, ProcessFunctions also expose to the user much more powerful functionality, such as registering timers, outputting to side outputs, etc. Cheers,

Re: state schema evolution for case classes

2020-03-17 Thread Tzu-Li (Gordon) Tai
Hi Apoorv, Flink currently does not natively support schema evolution for state types using Scala case classes [1]. So, as Roman has pointed out, there are 2 possible ways for you to do that: - Implementing a custom serializer that support schema evolution for your specific Scala case classes,

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, On Fri, May 22, 2020 at 7:19 AM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Also, where do I put flint-conf.yaml in Idea to add additional required > config parameter: > > classloader.parent-first-patterns.additional: >

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Are you getting an exception from running the Harness? The Harness should already have the required configurations, such as the parent first classloading config. Otherwise, if you would like to add your own configuration, use the `withConfiguration` method on the `Harness` class. On Fri, May 22,

Re: Performance impact of many open windows at the same time

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi Joe, The main effect this should have is more state to be kept until the windows can be fired (and state purged). This would of course increase the time it takes to checkpoint the operator. I'm not sure if there will be significant runtime per-record impact caused by how windows are

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Tzu-Li (Gordon) Tai
This in general is not a good idea, as the state you query using queryable state within a job does not provide any consistency guarantees at all. Would it be possible to have some trigger that emits state of the windows, and join the states downstream? In general, that is a better approach for

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
, such as the source / sink providers and Flink Kafka connectors. Cheers, Gordon On Fri, May 22, 2020 at 12:04 PM Tzu-Li (Gordon) Tai wrote: > Are you getting an exception from running the Harness? > The Harness should already have the required configurations, such as the > par

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
Sorry, forgot to cc user@ as well in the last reply. On Fri, May 22, 2020 at 12:01 PM Tzu-Li (Gordon) Tai wrote: > As an extra note, the utilities you will find in `statefun-e2e-tests`, > such as the `StatefulFunctionsAppsContainers` is not yet intended for users. > This however was p

Re: Using Queryable State within 1 job + docs suggestion

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, That in general is not a good idea, with the problem you mentioned as well as the fact that the state you query within the same job using queryable state does not provide any means of consistency guarantee. When it comes to "querying state from another operator", it is a hint that your use

Re: Flink Window with multiple trigger condition

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, To achieve what you have in mind, I think what you have to do is to use a processing time window of 30 mins, and have a custom trigger that matches the "start" event in the `onElement` method and return TriggerResult.FIRE_AND_PURGE. That way, the window fires either when the processing time

Re: State Storage Questions

2020-09-04 Thread Tzu-Li (Gordon) Tai
Hi, On Fri, Sep 4, 2020 at 1:37 PM Rex Fenley wrote: > Hello! > > I've been digging into State Storage documentation, but it's left me > scratching my head with a few questions. Any help will be much appreciated. > > Qs: > 1. Is there a way to use RocksDB state backend for Flink on AWS EMR? >

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-04 Thread Tzu-Li (Gordon) Tai
Hi Alexey, Is there a specific reason why you want to test against RocksDB? Otherwise, in Flink tests we use a `KeyedOneInputStreamOperatorTestHarness` [1] that allows you to wrap a user function and eliminate the need to worry about setting up heavy runtime context / dependencies such as the

Re: FLINK YARN SHIP from S3 Directory

2020-09-04 Thread Tzu-Li (Gordon) Tai
Hi, As far as I can tell from a recent change [1], this seems to be possible now starting from Flink 1.11.x. Have you already tried this with the latest Flink version? Also including Klou in this email, who might be able to confirm this. Cheers, Gordon [1]

Re: State Storage Questions

2020-09-08 Thread Tzu-Li (Gordon) Tai
> > On Fri, Sep 4, 2020 at 1:20 AM Tzu-Li (Gordon) Tai > wrote: > >> Hi, >> >> On Fri, Sep 4, 2020 at 1:37 PM Rex Fenley wrote: >> >>> Hello! >>> >>> I've been digging into State Storage documentation, but it's left me >>&g

Re: Flink Stateful Functions API

2020-09-14 Thread Tzu-Li (Gordon) Tai
Hi! Dawid is right, there currently is no developer documentation for the remote request-reply protocol. One reason for this is that the protocol isn't considered a fully stable user-facing interface yet, and thus not yet properly advertised in the documentation. However, there are plans to

Re: Native State in Python Stateful Functions?

2020-10-08 Thread Tzu-Li (Gordon) Tai
Hi, Nice to hear that you are trying out StateFun! It is by design that function state is attached to each HTTP invocation request, as defined by StateFun's remote invocation request-reply protocol. This decision was made with typical application cloud-native architectures in mind - having

Re: Stateful function and large state applications

2020-10-13 Thread Tzu-Li (Gordon) Tai
Hi, The StateFun runtime is built directly on top of Apache Flink, so RocksDB as the state backend is supported as well as all the features for large state such as checkpointing and local task recovery. Cheers, Gordon On Wed, Oct 14, 2020 at 11:49 AM Lian Jiang wrote: > Hi, > > I am learning

Re: Stateful Functions + ML model prediction

2020-10-05 Thread Tzu-Li (Gordon) Tai
Hi John, It is definitely possible to use Apache Pulsar with StateFun. Could you open a JIRA ticket for that? It would be nice to see how much interest we can gather on adding that as a new IO module, and consider adding native support for Pulsar in future releases. If you are already using

Re: Remote Stateful Function Scalability

2020-10-17 Thread Tzu-Li (Gordon) Tai
Hi Elias, On Sun, Oct 18, 2020 at 6:16 AM Elias Levy wrote: > After reading the Stateful Functions documentation, I am left wondering > how remote stateful functions scale. > > The documentation mentions that the use of remote functions allows the > state and compute tiers to scale

Re: Native State in Python Stateful Functions?

2020-10-09 Thread Tzu-Li (Gordon) Tai
ining-functions [2] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/concepts/distributed_architecture.html#co-located-functions On a separate topic, is anyone using StateFun in production? > > > > Thanks, > > Dan > > > > *From: *"Tzu-Li

Re: Stateful Functions + ML model prediction

2020-10-07 Thread Tzu-Li (Gordon) Tai
Kafka so that users can define them textually in `module.yaml` definition files, but this approach you pointed definitely works for the time being. Cheers, Gordon > > Cheers, > John. > > ------ > *From:* Tzu-Li (Gordon) Tai > *Sent:* Monday 5 October

Re: Statefun + Confluent Fully-managed Kafka

2020-10-07 Thread Tzu-Li (Gordon) Tai
Hi Hezekiah, I've confirmed that the Kafka properties set in the module specification file (module.yaml) are indeed correctly being parsed and used to construct the internal Kafka clients. StateFun / Flink does not alter or modify the properties. So, this should be something wrong with your

[ANNOUNCE] Apache Flink Stateful Functions 2.2.0 released

2020-09-28 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.2.0. Stateful Functions is an API that simplifies the building of distributed stateful applications with a runtime built for serverless architectures. It's based on functions with persistent

Re: Stateful functions Harness

2020-05-27 Thread Tzu-Li (Gordon) Tai
Intellij > > Any work arounds? > > > > > On May 22, 2020, at 12:03 AM, Tzu-Li (Gordon) Tai > wrote: > > Hi, > > Sorry, I need to correct my comment on using the Kafka ingress / egress > with the Harness. > > That is actually doable, by adding an extra depen

Re: Stateful functions Harness

2020-05-27 Thread Tzu-Li (Gordon) Tai
ly add a Flink source function as the ingress. If you want to use that directly in a normal application (not just execution in IDE with the Harness), you can define your ingesses/egresses by binding SourceFunctionSpec / SinkFunctionSpec. Please see how they are being used in the Harness class for example

Re: Stateful-fun-Basic-Hello

2020-05-25 Thread Tzu-Li (Gordon) Tai
Hi, It seems like you are trying to package your Stateful Functions app as a Flink job, and submit that to an existing cluster. If that indeed is the case, Stateful Functions apps have some required confogurations that need to be set via the flink-conf.yaml file for your existing cluster. Please

Re: Stateful-fun-Basic-Hello

2020-05-25 Thread Tzu-Li (Gordon) Tai
Hi, You're right, maybe the documentation needs a bit more directions there, especially for people who are newer to Flink. 1. How to increase parallelism There are two ways to do this. Either set the `parallelism.default` also in the flink-conf.yaml, or use the -p command line option when

Re: stateful-fun2.0 checkpointing

2020-05-25 Thread Tzu-Li (Gordon) Tai
Hi, I replied to your question on this in your other email thread. Let us know if you have other questions! Cheers, Gordon On Sun, May 24, 2020, 1:01 AM C DINESH wrote: > Hi Team, > > 1. How can we enable checkpointing in stateful-fun2.0 > 2. How to set parallelism > > Thanks, > Dinesh. > >

Re: StateFun remote/embedded polyglot example

2020-05-31 Thread Tzu-Li (Gordon) Tai
Hi, On Mon, Jun 1, 2020 at 5:47 AM Omid Bakhshandeh wrote: > Hi, > > I'm very confused about StateFun 2.0 new architecture. > > Is it possible to have both remote and embedded functions in the same > deployment? > Yes that is possible. Embedded functions simply run within the Flink StateFun

Re: MaxConnections understanding on FlinkKinesisProducer via KPL

2020-07-21 Thread Tzu-Li (Gordon) Tai
Hi Vijay, I'm not entirely sure of the semantics between ThreadPoolSize and MaxConnections since they are all KPL configurations (this specific question would probably be better directed to AWS), but my guess would be that the number of concurrent requests to the KPL backend is capped by

Re: FlinkKinesisProducer blocking ?

2020-07-21 Thread Tzu-Li (Gordon) Tai
er second per >> shard are buffered in an unbounded queue and dropped when their RecordTtl >> expires. >> >> To avoid data loss, you can enable backpressuring by restricting the size >> of the internal queue: >> >> // 200 Bytes per record, 1 shard >> kine

Re: Validating my understanding of SHARD_DISCOVERY_INTERVAL_MILLIS

2020-07-21 Thread Tzu-Li (Gordon) Tai
Hi Vijay, Your assumption is correct that the discovery interval does not affect the interval of fetching records. As a side note, you can actually disable shard discovery, by setting the value to -1. The FlinkKinesisProducer would then only call ListShards once at job startup. Cheers, Gordon

Re: Updating kafka connector with state

2020-08-10 Thread Tzu-Li (Gordon) Tai
Hi Nikola, If I remember correctly, state is not compatible between flink-connector-kafka-0.11 and the universal flink-connector-kafka. Piotr (cc'ed) would probably know whats going on here. Cheers, Gordon On Mon, Aug 10, 2020 at 1:07 PM Nikola Hrusov wrote: > Hello, > > We are trying to

Re: TaskManager docker image for Beam WordCount failing with ClassNotFound Exception

2020-07-08 Thread Tzu-Li (Gordon) Tai
Hi, Assuming that the job jar bundles all the required dependencies (including the Beam dependencies), making them available under `/opt/flink/usrlib/` in the container either by mounting or directly adding the job artifacts should work. AFAIK It is also the recommended way, as opposed to adding

Re: FlinkKinesisProducer blocking ?

2020-07-08 Thread Tzu-Li (Gordon) Tai
Hi Vijay, The FlinkKinesisProducer does not use blocking calls to the AWS KDS API. It does however apply backpressure (therefore effectively blocking all upstream operators) when the number of outstanding records accumulated exceeds a set limit, configured using the

Re: Chaining the creation of a WatermarkStrategy doesn't work?

2020-07-08 Thread Tzu-Li (Gordon) Tai
Hi, This would be more of a Java question. In short, type inference of generic types does not work for chained invocations, and therefore type information has to be explicitly included. If you'd like to chain the calls, this would work: WatermarkStrategy watermarkStrategy = WatermarkStrategy

Re: Chaining the creation of a WatermarkStrategy doesn't work?

2020-07-08 Thread Tzu-Li (Gordon) Tai
Ah, didn't realize Chesnay has it answered already, sorry for the concurrent reply :) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Any python example with json data from Kafka using flink-statefun

2020-06-16 Thread Tzu-Li (Gordon) Tai
(forwarding this to user@ as it is more suited to be located there) Hi Sunil, With remote functions (using the Python SDK), messages sent to / from them must be Protobuf messages. This is a requirement since remote functions can be written in any language, and we use Protobuf as a means for

Re: How To subscribe a Kinesis Stream using enhance fanout?

2020-06-09 Thread Tzu-Li (Gordon) Tai
RA ticket. > > Thanks > > > > ‐‐‐ Original Message ‐‐‐ > On Thursday, 14 May 2020 11:34, Xiaolong Wang > wrote: > > Thanks, I'll check it out. > > On Thu, May 14, 2020 at 6:26 PM Tzu-Li (Gordon) Tai > wrote: > >> Hi Xiaolong, >> >> You are right

Re: KeyedStream and keyedProcessFunction

2020-06-09 Thread Tzu-Li (Gordon) Tai
Hi, Records with the same key will be processed by the same partition. Note there isn't an instance of a keyed process function for each key. There is a single instance per partition, and all keys that are distributed to the same partition will get processed by the same keyed process function

Re: Suggestions for using both broadcast sync and conditional async-io

2020-06-04 Thread Tzu-Li (Gordon) Tai
Hi, For the initial DB fetch and state bootstrapping: That's exactly what the State Processor API is for, have you looked at that already? It currently does support bootstrapping broadcast state [1], so that should be good news for you. As a side note, I may be missing something, is broadcast

[ANNOUNCE] Apache Flink Stateful Functions 2.1.0 released

2020-06-09 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.1.0. Stateful Functions is an API that simplifies building distributed stateful applications. It's based on functions with persistent state that can interact dynamically with strong consistency

Re: MaxConnections understanding on FlinkKinesisProducer via KPL

2020-07-23 Thread Tzu-Li (Gordon) Tai
kpressuring by restricting the size > of the internal queue: > > // 200 Bytes per record, 1 shard > kinesis.setQueueLimit(500); > > > On Tue, Jul 21, 2020 at 8:00 PM Tzu-Li (Gordon) Tai > wrote: > >> Hi Vijay, >> >> I'm not entirely sure of the semantics between

Re: Statefun with RabbitMQ consumes message but does not run statefun

2021-01-12 Thread Tzu-Li (Gordon) Tai
Hi, There is no lock-step of releasing a new StateFun release when a new Flink release goes out. StateFun and Flink have individual releasing schemes and schedules. Usually, for new major StateFun version releases, we will upgrade its Flink dependency to the latest available version. We are

Re: debug statefun

2020-11-10 Thread Tzu-Li (Gordon) Tai
Hi, StateFun provide's a Harness utility exactly for that, allowing you to test a StateFun application in the IDE / setting breakpoints etc. You can take a look at this example on how to use the harness:

[ANNOUNCE] Apache Flink Stateful Functions 2.2.1 released

2020-11-11 Thread Tzu-Li (Gordon) Tai
The Apache Flink community released the first bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.1. This release fixes a critical bug that causes restoring a Stateful Functions cluster from snapshots (checkpoints or savepoints) to fail under certain conditions. *We

Re: Problems with FlinkKafkaProducer - closing after timeoutMillis = 9223372036854775807 ms.

2020-11-19 Thread Tzu-Li (Gordon) Tai
Hi, One thing to clarify first: I think the "Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms" log doesn't necessarily mean that a producer was closed due to timeout (Long.MAX_VALUE). I guess that is just a Kafka log message that is logged when a Kafka producer is closed

Re: split avro kafka field

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, 1. You'd have to configure your Kafka connector source to use a DeserializationSchema that deserializes the Kafka record byte to your generated Avro type. You can use the shipped `AvroDeserializationSchema` for that. 2. After your Kafka connector source, you can use a flatMap transformation

Re: Flink State Processor API - Bootstrap One state

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, Using the State Processor API, modifying the state in an existing savepoint results in a new savepoint (new directory) with the new modified state. The original savepoint remains intact. The API allows you to only touch certain operators, without having to touch any other state and have them

Re: Kafka SQL table Re-partition via Flink SQL

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, I'm pulling in some Flink SQL experts (in CC) to help you with this one :) Cheers, Gordon On Tue, Nov 17, 2020 at 7:30 AM Slim Bouguerra wrote: > Hi, > I am trying to author a SQL job that does repartitioning a Kafka SQL table > into another Kafka SQL table. > as example input/output

Re: Flink 1.10 -> Savepoints referencing to checkpoints or not

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, Both the data and metadata is being stored in the savepoint directory, since Flink 1.3. The metadata in the savepoint directory does not reference and checkpoint data files. In 1.11, what was changed was that the savepoint metadata uses relative paths to point to the data files in the

Re: debug statefun

2020-11-10 Thread Tzu-Li (Gordon) Tai
ttps://github.com/apache/flink-statefun/blob/master/pom.xml#L85,L89 >> [4] >> https://github.com/apache/flink-statefun/blob/master/statefun-examples/statefun-python-greeter-example/Dockerfile#L20 >> >> On Tue, Nov 10, 2020 at 4:47 PM Tzu-Li (Gordon) Tai >> wrote: >

Re: debug statefun

2020-11-10 Thread Tzu-Li (Gordon) Tai
On Wed, Nov 11, 2020 at 1:44 PM Tzu-Li (Gordon) Tai wrote: > Hi Lian, > > Sorry, I didn't realize that the issue you were bumping into was caused by > the module not being discovered. > You're right, the harness utility would not help here. > Actually, scratch this comment. T

Re: cannot pull statefun docker image

2020-11-06 Thread Tzu-Li (Gordon) Tai
Hi, The Dockerfiles in the examples in the flink-statefun repo currently work against images built from snapshot development branches. Ververica has been hosting StateFun base images for released versions: https://hub.docker.com/r/ververica/flink-statefun You can change `FROM flink-statefun:*`

[ANNOUNCE] Stateful Functions Docker images are now hosted on Dockerhub at apache/flink-statefun

2021-01-18 Thread Tzu-Li (Gordon) Tai
Hi, We have created an "apache/flink-statefun" Dockerhub repository managed by the Flink PMC, at: https://hub.docker.com/r/apache/flink-statefun The images for the latest stable StateFun release, 2.2.2, have already been pushed there. Going forward, it will be part of the release process to make

[ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-01 Thread Tzu-Li (Gordon) Tai
The Apache Flink community released the second bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.2. *We strongly recommend all users to upgrade to this version.* *Please check out the release announcement:*

Re: [Stateful Functions] Problems with Protobuf Versions

2021-02-01 Thread Tzu-Li (Gordon) Tai
Hi, This hints an incompatible Protobuf generated class by the protoc compiler, and the runtime dependency used by the code. Could you try to make sure the `protoc` compiler version matches the Protobuf version in your code? Cheers, Gordon On Fri, Jan 29, 2021 at 6:07 AM Jan Brusch wrote: >

Re: Stateful Functions - accessing the state aside of normal processing

2021-01-27 Thread Tzu-Li (Gordon) Tai
Hi Stephan, Great to hear about your experience with StateFun so far! I think what you are looking for is a way to read StateFun checkpoints, which are basically an immutable consistent point-in-time snapshot of all the states across all your functions, and run some computation or simply to

Re: Question a possible use can for Iterative Streams.

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi Marco, In the ideal setup, enrichment data existing in external databases is bootstrapped into the streaming job via Flink's State Processor API, and any follow-up changes to the enrichment data is streamed into the job as a second union input on the enrichment operator. For this solution to

Re: Statefun: cancel "sendAfter"

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi, You are right, currently StateFun does not support deleting a scheduled delayed message. StateFun supports delayed messages by building on top of two Flink constructs: 1) registering processing time timers, and 2) buffering the message payload to be sent in state. The delayed messages are

Re: Question on Flink and Rest API

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi, There is no out-of-box Flink source/sink connector for this, but it isn't unheard of that users have implemented something to support what you outlined. One way to possibly achieve this is: in terms of a Flink streaming job graph, what you would need to do is co-locate the source (which

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-28 Thread Tzu-Li (Gordon) Tai
Hi Timothy, It would indeed be hard to figure this out without any stack traces. Have you tried changing to debug level logs? Maybe you can also try using the StateFun Harness to restore and run your job in the IDE - in that case you should be able to see which code exactly is throwing this

Re: Configure Kafka ingress through property files in Stateful function 3.0.0

2021-05-28 Thread Tzu-Li (Gordon) Tai
Hi Jessy, I assume "consumer.properties" is a file you have included in your StateFun application's image? The ingress.spec.properties field in the module YAML specification file expects a list of key value pairs, not a properties file. See for example [1]. I think it could make sense to

<    1   2   3   4   5   6   >