How to test stateful streaming pipeline?

2022-02-01 Thread Marcin Kuthan
Hi This is my first question to the community so welcome everyone :) On a daily basis I’m using Apache Beam for developing streaming pipelines but I would like to learn native Flink as well. I’m looking for examples on how to write integration tests with full programmatic control over watermark an

Re: GenericType problem in KeyedCoProcessFunction after disableGenericTypes()

2022-02-01 Thread Yoel Benharrous
Hi Deniz, You could declare UUIDGenerator as a transient field and instanciate it in the open function Ot if you want to inject any UUIDGenerator you could provide a supplier of UUIDGenerator that should implement Serializable and invoke it in the open function. On Tue, Feb 1, 2022, 10:01 PM

Future support for custom FileEnumerator in FileSource?

2022-02-01 Thread Kevin Lam
Hi all, We're interested in being able to filter files using the new FileSource API . Are there plans to add it? If there's existing work, we would be happy to help push this fo

GenericType problem in KeyedCoProcessFunction after disableGenericTypes()

2022-02-01 Thread Deniz Koçak
Hi All, We have a function extending `KeyedCoProcessFunction` and within that function implementation. I wanted to keep a class object as a field which is simply responsible for generating a UUID. We disabled Kyro fallback for generic types via `env.getConfig().disableGenericTypes()`. I am receiv

How to proper hashCode() for keys.

2022-02-01 Thread John Smith
Hi, getting java.lang.IllegalArgumentException: Key group 39 is not in KeyGroupRange{startKeyGroup=96, endKeyGroup=103}. Unless you're directly using low level state access APIs, this is most likely caused by non-deterministic shuffle key (hashCode and equals implementation). This is my class, is

Memory issues with Rocksdb ColumnFamilyOptions

2022-02-01 Thread Natalie Dunn
Hi All, I am working on trying to process a Savepoint in order to produce basic statistics on it for monitoring. I’m running into an issue where processing a large Savepoint is running out of memory before I can process the Savepoint completely. One thing I noticed in profiling the code is tha

SV: StateFun deployement options

2022-02-01 Thread Christopher Gustafson
Hi Igal, I have been reading some threads but the ones I found was fairly old and thus I wasn't sure whether the solutions were accurate or not. If you have any threads that explain how to deploy SF as a Flink Job please do. I have been trying the suggested version from the old docs but haven't

Queryable State Deprecation

2022-02-01 Thread Jatti, Karthik
Hi, I see on the Flink Roadmap that Queryable state API is scheduled to be deprecated but I couldn’t find much information on confluence or this mailing group’s archives to understand the background as to why it’s being deprecated and what would be a an alternative. Any pointers to help me get

Re: Show plan in UI not working.

2022-02-01 Thread John Smith
Hi here it is: https://issues.apache.org/jira/browse/FLINK-25812 Finally I think it looks liek a Javascript issue on the Ui rather than the cluster. On Tue, 25 Jan 2022 at 02:58, Ingo Bürk wrote: > Hi John, > > can you please submit this as an issue in JIRA? If you suspect it is > related to ot

Re: KafkaSource vs FlinkKafkaConsumer010

2022-02-01 Thread David Anderson
Before Kafka introduced their universal client, Flink had version-specific connectors, e.g., for versions 0.8, 0.9, 0.10, and 0.11. Those were eventually removed in favor of FlinkKafkaConsumer, which is/was backward compatible back to Kafka version 0.10. FlinkKafkaConsumer itself was deprecated in

Re: Kafka Consumer Group Name not set if no checkpointing?

2022-02-01 Thread John Smith
Ok that's fine. It's just a thing we are used to that functionality from basically every other consumer we have flink or not. So we monitor the offsets for lateness or just to look. On Tue, 1 Feb 2022 at 03:38, Fabian Paul wrote: > Hi John, > > You are seeing what I described in my previous mail

Re: regarding flink metrics

2022-02-01 Thread Chesnay Schepler
Your best bet is to create a custom reporter that does this calculation. You could either wrap the reporter, subclass is, or fork it. In any case, https://github.com/apache/flink/tree/master/flink-metrics/flink-metrics-datadog should be a good starting point. On 01/02/2022 13:26, Jessy Ping wr

regarding flink metrics

2022-02-01 Thread Jessy Ping
Hi Team, We are using datadog and its http reporter( packaged in flink image) for sending metrics from flink application. We do have a requirement for setting tags with values calculated at runtime for the custom metrics emitted from Flink. Currently, it is impossible to assign tags at runtime. Is

Re: StateFun deployement options

2022-02-01 Thread Igal Shilman
Hello Christopher, The most common deployment of StateFun applications is via the community provided Docker images[1] (and their derivations) . This image captures the optimal deployment of Flink for StateFun. In addition, there is also an example of how to deploy these images to k8s[2]. If you ar

[ANNOUNCE] Apache Flink Stateful Functions 3.2.0 released

2022-02-01 Thread Till Rohrmann
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 3.2.0. Stateful Functions is an API that simplifies building distributed stateful applications. It's based on functions with persistent state that can interact dynamically with strong consistency gu

StateFun deployement options

2022-02-01 Thread Christopher Gustafson
Hi, I am looking into ways to deploy StateFun jobs. I noticed that the option of packaging a StateFun job as a fat jar to an existing flink cluster was described in the 2.2 version of the docs

Question about object reusing in Flink SQL

2022-02-01 Thread LM Kang
Hi community, I have read a blog named <> [1], which says enabling object reuse can greatly improve performance of Blink Planner. But as I see in the code (v1.14), there’s few occurrences of controllable object reusing in Flink SQL-related modules. What’s more, when enabling object reuse,

Re: Flink 1.14 metrics : taskmanager host name missing

2022-02-01 Thread Chesnay Schepler
There is unfortunately no knob for you to turn to get the previous behavior. The host variable has a number of issues (being inconsistent across processes, not being configurable, not being stable (because we just forward what we get from the RPC layer)), such that at the moment it probably sh

Re: Kafka Consumer Group Name not set if no checkpointing?

2022-02-01 Thread Fabian Paul
Hi John, You are seeing what I described in my previous mail. The KafkaSource only writes consumer offsets to Kafka when a checkpoint is finished [1]. Flink does not leverage the offsets stored in Kafka to ensure exactly-once processing but it writes the last read offset to Flink's internal state

Re: KafkaSource vs FlinkKafkaConsumer010

2022-02-01 Thread Francesco Guardiani
I think the FlinkKakfaConsumer010 you're talking about is the old source api. You should use only KafkaSource now, as they use the new source infrastructure. On Tue, Feb 1, 2022 at 9:02 AM HG wrote: > Hello Francesco > Perhaps I copied the wrong link of 1.2. > But there is also > https://nightli

Re: KafkaSource vs FlinkKafkaConsumer010

2022-02-01 Thread HG
Hello Francesco Perhaps I copied the wrong link of 1.2. But there is also https://nightlies.apache.org/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer010.html It seems there are 2 ways to use Kafka KafkaSource source = KafkaSource.builder()