OperatorStateFromBackend can't complete initialisation because of high number of savepoint files reads

2024-08-16 Thread William Wallace
Context We have recently upgraded from Flink 1.13.6 to Flink 1.19. We consume data from ~ 40k Kafka topic partitions in some environments. We are using aligned checkpoints. We set state.storage.fs.memory-threshold: 500kb. Problem At the point when the state for operator using topic-partition-off

Unsubscribe

2023-07-16 Thread William Wang

Re: How to verify if checkpoints are asynchronous or sync

2020-01-07 Thread William C
Hallo on 2020/1/8 11:31, RKandoji wrote: I'm running my job on a EC2 instance with 32 cores and according to the documentation I tried to use as many task slots the number of cores, numOfTaskSlots=32 and parallelism=32. But I noticed that the performance is slightly degrading when I'm using 32

Re: How to verify if checkpoints are asynchronous or sync

2020-01-07 Thread William C
Can you enable debug log to check with that? regards. on 2020/1/8 6:36, RKandoji wrote: But I'm curious if there is way to verify if the checkpoints are happening asynchronously or synchronously.

Flink bitnami

2020-01-06 Thread william
Hallo Is there a flink template for deployment on bitnami? Thanks

Re: Non incremental window function accumulates unbounded state with RocksDb

2019-08-30 Thread William Jonsson
would be interesting if anyone have the same experience as I have. The pipeline is currently running on Flink 1.7.2 Best regards and wish you a pleasant day, William From: Yun Tang Date: Friday, 30 August 2019 at 11:42 To: William Jonsson , "user@flink.apache.org" Cc: Fleet Perc

Non incremental window function accumulates unbounded state with RocksDb

2019-08-30 Thread William Jonsson
String input and a “histogram” output class. Do you have any input or ideas how the state could be manageable in the Heap case but totally unhandleable during the RocksDb version? Best regards, William class Histogram extends WindowFunction[String, Histogram, TimeWindow] { def process (key : T

Backoff strategies for async IO functions?

2019-03-07 Thread William Saar
Hi, Is there a way to specify an exponential backoff strategy for when async function calls fail? I have an async function that does web requests to a rate-limited API. Can you handle that with settings on the async function call? Thanks, William

Flink 1.6.4 signing key file in docker-flink repo?

2019-02-25 Thread William Saar
: Total number processed: 16 gpg:   imported: 16 gpg: no ultimately trusted keys found + gpg --batch --verify flink.tgz.asc flink.tgz gpg: Signature made Fri Feb 15 13:13:29 2019 UTC gpg:    using RSA key 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E gpg: Can't check signature: No public key Thanks,William

What async Scala HTTP client do people use with Flink async functions?

2019-02-05 Thread William Saar
essing I'm not the only one doing async functions in Scala so figured I'd ask :) Thanks, William Saar

Re: How to add caching to async function?

2019-02-05 Thread William Saar
: "Lasse Nedergaard" To:"William Saar" Cc:"Fabian Hueske" , "user" Sent:Tue, 5 Feb 2019 10:41:41 +0100 Subject:Re: How to add caching to async function? Hi William No iterations isn’t the solution as you can (will) end up in a deadlock. We conclude

Re: How to add caching to async function?

2019-02-05 Thread William Saar
Thanks! Looks like iterations is indeed the way to go for now then... - Original Message - From: "Lasse Nedergaard" To:"Fabian Hueske" Cc:"William Saar" , "user" Sent:Mon, 4 Feb 2019 20:20:30 +0100 Subject:Re: How to add caching to async f

How to add caching to async function?

2019-02-04 Thread William Saar
etting: State is not supported in rich async functions. What is the best practice for doing this? I guess I could have a previous step with state and send the responses from the rich function back as an iteration, but I would guess that's the wrong approach... Thanks, William

Use s3 on Flink on kubernetes

2018-12-20 Thread William Saar
How can I easiest use s3 from a Flink job deployed in a session cluster on kubernetes? I've tried including the flink-s3-fs-hadoop dependency in the sbt file for my job, can I programmatically set the properties to point to it? Is there a ready-made docker image for a flink with s3 dependencies

Can't list logs or stdout through web console on Flink 1.7 Kubernetes

2018-12-19 Thread William Saar
I'm running Flink 1.7 in ECS, is this a known issue or should I create a jira? The web console doesn't show anything when trying to list logs or stdout for task managers and the job manager log have stack traces for the errors 2018-12-19 15:35:53,498 ERROR org.apache.flink.runtime.rest.handler.

Re: Generating processing time watermarks in idle event time kafka streams?

2018-12-14 Thread William Saar
Thanks, works great! This should be very useful for real-time dashboard that want to compute in event time, especially for multi-tenant systems or other specialized kafka topics that can have gaps in the traffic. - Original Message - From: "Aljoscha Krettek" To:"Wi

Generating processing time watermarks in idle event time kafka streams?

2018-12-14 Thread William Saar
Any standardized components to generate watermarks based on processing time in an event time stream when there is no data from a source? The docs for event time [1] indicate that people are doing this, but the only suggestion on Stack Overflow [2] is to make every window operator in stream have a

Streaming to Parquet Files in HDFS

2018-09-28 Thread William Speirs
I'm trying to stream log messages (syslog fed into Kafak) into Parquet files on HDFS via Flink. I'm able to read, parse, and construct objects for my messages in Flink; however, writing to Parquet is tripping me up. I do *not* need to have this be real-time; a delay of a few minutes, even up to an

Re: Far too few watermarks getting generated with Kafka source

2018-01-18 Thread William Saar
data used to come from two different topics). William - Original Message - From: "Gary Yao" To: "William Saar" Cc: "user" Sent: Thu, 18 Jan 2018 11:11:17 +0100 Subject: Re: Far too few watermarks getting generated with Kafka source Hi William, How often

Far too few watermarks getting generated with Kafka source

2018-01-17 Thread William Saar
through the pipeline, we're just not getting watermarks... Thanks, William

Timestamps and watermarks in CoProcessFunction function

2018-01-16 Thread William Saar
Hi, I have added the code below to the start of processElement2 in CoProcessFunction. It prints timestamps and watermarks for the first 3 elements for each new watermark. Shouldn't the timestamp always be lower than the next watermark? The 3 timestamps before the last watermark are all larger than

Re: Access to time in aggregation, or aggregation in ProcessWindowFunction?

2017-06-20 Thread William Saar
Hi, That looks perfect! I realized I could probably use an Evictor together with my WindowProcessFunction to prevent the window from preserving the whole state, but ditching the window looks even better. Thanks a lot! William - Original Message - From: "Nico Kruber" To: C

Access to time in aggregation, or aggregation in ProcessWindowFunction?

2017-06-20 Thread William Saar
timestamp meets the expiration condition, or if the elements iterable parameter does not contain any new elements (deducing that the processing must have been triggered by a timer invocation and not a new element). Is there a better way to do this? Thanks, William

Re: Porting batch percentile computation to streaming window

2017-05-30 Thread William Saar
Flink. - Original Message - From: "Gyula Fóra" To: "William Saar" , Cc: Sent: Tue, 30 May 2017 13:56:08 + Subject: Re: Porting batch percentile computation to streaming window I think you could actually do a window operation to get the tDigestStre

Re: Porting batch percentile computation to streaming window

2017-05-30 Thread William Saar
ark information or something on metrics objects in line 1 and emit T digests more often in line 2? Finally, how do I access the watermark/window information in my fold operation in line 1? Thanks! - Original Message - From: "Gyula Fóra" To:"William Saar" , Cc: Sent:Tue, 30

Porting batch percentile computation to streaming window

2017-05-29 Thread William Saar
after seeing all stream values). Thanks! William

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-24 Thread William Saar
the end of the window + allowedLateness you have set. On Tue, Nov 22, 2016 at 11:08 PM, William Saar wrote: > Thanks! > One difference is that my topology had 2 sources. I have updated your > example to also use 2 sources and that breaks the co-group operation in the > e

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-22 Thread William Saar
. William - Original Message - From: user@flink.apache.org To:"user@flink.apache.org" Cc: Sent:Tue, 22 Nov 2016 11:50:52 +0100 Subject:Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams? Hi William, I've reproduced your example locally for some toy data and

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-21 Thread William Saar
ld need to keep up to 3 copies of all events (for at least the smallest window size) to compute the same type of results. Hälsningar! William - Original Message - From: user@flink.apache.org To: Cc: Sent:Mon, 21 Nov 2016 08:22:16 + Subject:Re: ContinuousEventTimeTrigger breaks coGroupe

ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-19 Thread William Saar
0)))     .fold(...); stream1.coGroup(stream2).where(...).equalTo(...)     .window(TumblingEventTimeWindows.of(Time.seconds(30)))     .trigger(ContinuousEventTimeTrigger.of(Time.seconds(10)))     .print() Thanks, William

Working with data locality in streaming using groupBy?

2015-06-05 Thread William Saar
sult = localAndRemoteStream.select("local").map(process).union(remoteStream).broadcast(); // Broadcast all fully processed results to all machines globalResult.fold().addSink(globalWindowOutputSink) // fold/reduce, I want a result based on the full contents of the window Any help would be greatly appreciated! Thanks, William