subject:"\[PR\] \[Flink\] Speed up file write in batch mode by using larger bundle size \[beam\]"

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-04-09 Thread via GitHub

Abacn merged PR #30802: URL: https://github.com/apache/beam/pull/30802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-04-08 Thread via GitHub

jto commented on PR #30802: URL: https://github.com/apache/beam/pull/30802#issuecomment-2042134252 Sure. I tested it on a job that consumes ~1B records (~150GB). With the Dataset API, runtime is 37min. Passing `--useDataStreamForBatch`, I killed it after 1h+ as it was clearly too

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-04-05 Thread via GitHub

Abacn commented on PR #30802: URL: https://github.com/apache/beam/pull/30802#issuecomment-2040486462 Hi, thanks, would you mind sharing some number regarding the performance difference. e.g. A test case of 20,000,000 elements, and the run time for different batch sizes -- This is an

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-04-05 Thread via GitHub

jto commented on PR #30802: URL: https://github.com/apache/beam/pull/30802#issuecomment-2039755040 Pinging @Abacn since you reviewed my past PRs on the flink runner :) Can you take a look ? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-04-03 Thread via GitHub

github-actions[bot] commented on PR #30802: URL: https://github.com/apache/beam/pull/30802#issuecomment-2034336637 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @chamikaramj added as fallback since no labels match

[PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-03-29 Thread via GitHub

jto opened a new pull request, #30802: URL: https://github.com/apache/beam/pull/30802 This PR removes the automated file sharding normally applied when the runner is passed `--useDataStreamForBatch`. Currently `FlinkStreamingPipelineTranslator.StreamingShardedWriteFactory`

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

[PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

6 matches

Site Navigation

Mail list logo

Footer information