Watermark never progress for deduplicate transform

2023-09-06 Thread hsy...@gmail.com
Hello, I'm using the https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/Deduplicate.html transform to help dedup my data but in the monitoring page I see the watermark is not moving forward. Is it common for that transformation? Thanks

Question about metrics

2023-05-12 Thread hsy...@gmail.com
Hi I have questions about metrics. I want to use beam metrics api to send metrics to GCP monitoring. Instead of collecting just some simple numeric values. I also need to send labels along with them. Is there a way to do that? Thanks!

Re: Questions about writing to BigQuery using storage api

2023-12-06 Thread hsy...@gmail.com
alsaud this might be interesting to > you too > > On Tue, Dec 5, 2023 at 9:39 PM hsy...@gmail.com wrote: > >> I'm using version 2.51.0 and The configuration is like this >> >> write >> .withoutValidation() >> .withCreateDisposition(BigQueryIO.Write.

Re: pubsubliteio is super slow

2023-12-19 Thread hsy...@gmail.com
, 2023 at 10:17 AM hsy...@gmail.com > wrote: > >> Any one is using pubsublite? I find it super slow 5 messages/sec and >> there is no options for me to tune the performance >> >

ParDo(DoFn) with multiple context.output vs FlatMapElements

2023-12-27 Thread hsy...@gmail.com
Hello I have a question. If I have a transform for each input it will emit 1 or many output (same collection) I can do it with ParDo + DoFun while in processElement method for each input, call context.output multiply times vs doing it with FlatMapElements, is there any difference? Does the

pubsubliteio is super slow

2023-12-19 Thread hsy...@gmail.com
Any one is using pubsublite? I find it super slow 5 messages/sec and there is no options for me to tune the performance

How to set flow control for pubsubliteio?

2023-12-20 Thread hsy...@gmail.com
How to change flow control config for pubsubliteio ? I saw the setting has been taken out as part of https://issues.apache.org/jira/browse/BEAM-14129 But without setup flow control correctly, my beam app is running super slow ingesting from pubsbulite and getting NO_CLIENT_TOKEN error on the

pubsubliteio ack problem

2023-12-21 Thread hsy...@gmail.com
In my application, the pubsubliteio seems never ack the message and the data lateness is building up forever, my question is how does dataflow know when to ack the message, How does the engine even know when it is processed?

Does withkeys transform enforce a reshuffle?

2024-01-18 Thread hsy...@gmail.com
Hey guys, I have a question, does withkeys transformation enforce a reshuffle? My pipeline basically look like this PubsubLiteIO -> ParDo(..) -> ParDo() -> BigqueryIO.write() The problem is PubsubLiteIO -> ParDo(..) -> ParDo() always fused together. But The ParDo is expensive and I want

Re: Questions about writing to BigQuery using storage api

2023-12-05 Thread hsy...@gmail.com
hich beam version are you using? > > > On Tue, Dec 5, 2023 at 1:52 PM hsy...@gmail.com wrote: > >> Any one has experience in writing to BQ using storage api >> >> I tried to use it because according to the document it is more efficient >> but I got error below >&

Questions about writing to BigQuery using storage api

2023-12-05 Thread hsy...@gmail.com
Any one has experience in writing to BQ using storage api I tried to use it because according to the document it is more efficient but I got error below 2023-12-05 04:01:29.741 PST Error message from worker: java.lang.RuntimeException: java.lang.IllegalStateException

Re: Questions about writing to BigQuery using storage api

2023-12-07 Thread hsy...@gmail.com
> On Thu, Dec 7, 2023 at 8:46 AM hsy...@gmail.com wrote: > >> Here is the complete stacktrace It doesn't even hit my code and it >> happens consistently! >> >> Error message from worker: java.lang.RuntimeException: >> java.lang.IllegalStateException >> or

Re: Questions about writing to BigQuery using storage api

2023-12-07 Thread hsy...@gmail.com
ll make this more >> straightforward. >> >> On Wed, Dec 6, 2023 at 11:24 AM hsy...@gmail.com >> wrote: >> >>> I’m just using dataflow engine >>> On Wed, Dec 6, 2023 at 08:23 John Casey via user >>> wrote: >>> >>>> Well, that is odd.

Re: Does withkeys transform enforce a reshuffle?

2024-01-19 Thread hsy...@gmail.com
thKeys > > Have you tried to just add ReShuffle after PubsubLiteIO? > > On Thu, Jan 18, 2024 at 8:54 PM hsy...@gmail.com wrote: > >> Hey guys, >> >> I have a question, does withkeys transformation enforce a reshuffle? >> >> My pipeline basicall

Re: Does withkeys transform enforce a reshuffle?

2024-01-19 Thread hsy...@gmail.com
Also I looked at the code, reshuffle seems doing some groupby work internally. But I don't really need groupby On Fri, Jan 19, 2024 at 9:35 AM hsy...@gmail.com wrote: > ReShuffle is deprecated > > On Fri, Jan 19, 2024 at 8:25 AM XQ Hu via user > wrote: > >> I do