Yeah unfortunately the data on the endpoint could change at any point in time and I need to make sure to have the latest one :/
That limits my options here. But I also have other sources that can benefit from this caching :) Thank you very much! On Mon, Apr 15, 2024 at 9:37 AM XQ Hu <[email protected]> wrote: > > I am not sure you still need to do batching since Web API can handle caching. > > If you really need it, I think GoupIntoBatches is a good way to go. > > On Mon, Apr 15, 2024 at 11:30 AM Ruben Vargas <[email protected]> wrote: >> >> Is there a way to do batching in that transformation? I'm assuming for >> now no. or may be using in conjuntion with GoupIntoBatches >> >> On Mon, Apr 15, 2024 at 9:29 AM Ruben Vargas <[email protected]> wrote: >> > >> > Interesting >> > >> > I think the cache feature could be interesting for some use cases I have. >> > >> > On Mon, Apr 15, 2024 at 9:18 AM XQ Hu <[email protected]> wrote: >> > > >> > > For the new web API IO, the page lists these features: >> > > >> > > developers provide minimal code that invokes Web API endpoint >> > > delegate to the transform to handle request retries and exponential >> > > backoff >> > > optional caching of request and response associations >> > > optional metrics >> > > >> > > >> > > On Mon, Apr 15, 2024 at 10:38 AM Ruben Vargas <[email protected]> >> > > wrote: >> > >> >> > >> That one looks interesting >> > >> >> > >> What is not clear to me is what are the advantages of using it? Is >> > >> only the error/retry handling? anything in terms of performance? >> > >> >> > >> My PCollection is unbounded but I was thinking of sending my messages >> > >> in batches to the external API in order to gain some performance >> > >> (don't expect to send 1 http request per message). >> > >> >> > >> Thank you very much for all your responses! >> > >> >> > >> >> > >> On Sun, Apr 14, 2024 at 8:28 AM XQ Hu via user <[email protected]> >> > >> wrote: >> > >> > >> > >> > To enrich your data, have you checked >> > >> > https://cloud.google.com/dataflow/docs/guides/enrichment? >> > >> > >> > >> > This transform is built on top of >> > >> > https://beam.apache.org/documentation/io/built-in/webapis/ >> > >> > >> > >> > On Fri, Apr 12, 2024 at 4:38 PM Ruben Vargas >> > >> > <[email protected]> wrote: >> > >> >> >> > >> >> On Fri, Apr 12, 2024 at 2:17 PM Jaehyeon Kim <[email protected]> >> > >> >> wrote: >> > >> >> > >> > >> >> > Here is an example from a book that I'm reading now and it may be >> > >> >> > applicable. >> > >> >> > >> > >> >> > JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100 >> > >> >> > PYTHON - ord(id[0]) % 100 >> > >> >> >> > >> >> Maybe this is what I'm looking for. I'll give it a try. Thanks! >> > >> >> >> > >> >> > >> > >> >> > On Sat, 13 Apr 2024 at 06:12, George Dekermenjian >> > >> >> > <[email protected]> wrote: >> > >> >> >> >> > >> >> >> How about just keeping track of a buffer and flush the buffer >> > >> >> >> after 100 messages and if there is a buffer on finish_bundle as >> > >> >> >> well? >> > >> >> >> >> > >> >> >> >> > >> >> >> > >> >> If this is in memory, It could lead to potential loss of data. That >> > >> >> is >> > >> >> why the state is used or at least that is my understanding. but maybe >> > >> >> there is a way to do this in the state? >> > >> >> >> > >> >> >> > >> >> >> On Fri, Apr 12, 2024 at 21.23 Ruben Vargas >> > >> >> >> <[email protected]> wrote: >> > >> >> >>> >> > >> >> >>> Hello guys >> > >> >> >>> >> > >> >> >>> Maybe this question was already answered, but I cannot find it >> > >> >> >>> and >> > >> >> >>> want some more input on this topic. >> > >> >> >>> >> > >> >> >>> I have some messages that don't have any particular key >> > >> >> >>> candidate, >> > >> >> >>> except the ID, but I don't want to use it because the idea is to >> > >> >> >>> group multiple IDs in the same batch. >> > >> >> >>> >> > >> >> >>> This is my use case: >> > >> >> >>> >> > >> >> >>> I have an endpoint where I'm gonna send the message ID, this >> > >> >> >>> endpoint >> > >> >> >>> is gonna return me certain information which I will use to >> > >> >> >>> enrich my >> > >> >> >>> message. In order to avoid fetching the endpoint per message I >> > >> >> >>> want to >> > >> >> >>> batch it in 100 and send the 100 IDs in one request ( the >> > >> >> >>> endpoint >> > >> >> >>> supports it) . I was thinking on using GroupIntoBatches. >> > >> >> >>> >> > >> >> >>> - If I choose the ID as the key, my understanding is that it >> > >> >> >>> won't >> > >> >> >>> work in the way I want (because it will form batches of the same >> > >> >> >>> ID). >> > >> >> >>> - Use a constant will be a problem for parallelism, is that >> > >> >> >>> correct? >> > >> >> >>> >> > >> >> >>> Then my question is, what should I use as a key? Maybe something >> > >> >> >>> regarding the timestamp? so I can have groups of messages that >> > >> >> >>> arrive >> > >> >> >>> at a certain second? >> > >> >> >>> >> > >> >> >>> Any suggestions would be appreciated >> > >> >> >>> >> > >> >> >>> Thanks.
