Hi,
Might be late to the discussion, but providing another option (as I
think it was not mentioned or I missed it). Take a look at
[this](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements)
as I think this is precisely
On Fri, Apr 12, 2024 at 1:39 PM Ruben Vargas
wrote:
> On Fri, Apr 12, 2024 at 2:17 PM Jaehyeon Kim wrote:
> >
> > Here is an example from a book that I'm reading now and it may be
> applicable.
> >
> > JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100
> > PYTHON - ord(id[0]) % 100
>
or
There are various strategies. Here is an example of how Beam does it (taken
from Reshuffle.viaRandomKey().withNumBuckets(N)
Note that this does some extra hashing to work around issues with the Spark
runner. If you don't care about that, you could implement something simpler
(e.g. initialize
Good day, Ruben,
Would you be able to compute a shasum on the group of IDs to use as the key?
Best,
Damon
On 2024/04/12 19:22:45 Ruben Vargas wrote:
> Hello guys
>
> Maybe this question was already answered, but I cannot find it and
> want some more input on this topic.
>
> I have some
Yeah unfortunately the data on the endpoint could change at any point
in time and I need to make sure to have the latest one :/
That limits my options here. But I also have other sources that can
benefit from this caching :)
Thank you very much!
On Mon, Apr 15, 2024 at 9:37 AM XQ Hu wrote:
>
I am not sure you still need to do batching since Web API can handle
caching.
If you really need it, I think GoupIntoBatches is a good way to go.
On Mon, Apr 15, 2024 at 11:30 AM Ruben Vargas
wrote:
> Is there a way to do batching in that transformation? I'm assuming for
> now no. or may be
Is there a way to do batching in that transformation? I'm assuming for
now no. or may be using in conjuntion with GoupIntoBatches
On Mon, Apr 15, 2024 at 9:29 AM Ruben Vargas wrote:
>
> Interesting
>
> I think the cache feature could be interesting for some use cases I have.
>
> On Mon, Apr 15,
Interesting
I think the cache feature could be interesting for some use cases I have.
On Mon, Apr 15, 2024 at 9:18 AM XQ Hu wrote:
>
> For the new web API IO, the page lists these features:
>
> developers provide minimal code that invokes Web API endpoint
> delegate to the transform to handle
For the new web API IO, the page lists these features:
- developers provide minimal code that invokes Web API endpoint
- delegate to the transform to handle request retries and exponential
backoff
- optional caching of request and response associations
- optional metrics
On Mon,
That one looks interesting
What is not clear to me is what are the advantages of using it? Is
only the error/retry handling? anything in terms of performance?
My PCollection is unbounded but I was thinking of sending my messages
in batches to the external API in order to gain some performance
To enrich your data, have you checked
https://cloud.google.com/dataflow/docs/guides/enrichment?
This transform is built on top of
https://beam.apache.org/documentation/io/built-in/webapis/
On Fri, Apr 12, 2024 at 4:38 PM Ruben Vargas
wrote:
> On Fri, Apr 12, 2024 at 2:17 PM Jaehyeon Kim
On Fri, Apr 12, 2024 at 2:17 PM Jaehyeon Kim wrote:
>
> Here is an example from a book that I'm reading now and it may be applicable.
>
> JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100
> PYTHON - ord(id[0]) % 100
Maybe this is what I'm looking for. I'll give it a try. Thanks!
>
> On Sat, 13
Here is an example from a book that I'm reading now and it may be
applicable.
JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100
PYTHON - ord(id[0]) % 100
On Sat, 13 Apr 2024 at 06:12, George Dekermenjian wrote:
> How about just keeping track of a buffer and flush the buffer after 100
> messages
How about just keeping track of a buffer and flush the buffer after 100
messages and if there is a buffer on finish_bundle as well?
On Fri, Apr 12, 2024 at 21.23 Ruben Vargas wrote:
> Hello guys
>
> Maybe this question was already answered, but I cannot find it and
> want some more input on
Hello guys
Maybe this question was already answered, but I cannot find it and
want some more input on this topic.
I have some messages that don't have any particular key candidate,
except the ID, but I don't want to use it because the idea is to
group multiple IDs in the same batch.
This is
15 matches
Mail list logo