Hi Jan,

Thanks for the fast reply! I came across an example that I wanted to
recreate in Beam, and I'm sharing the link below. Generally speaking, nodes
keep their favourite words and accept only jobs that involve those
favourites. This is a simple example but could be beneficial in processing
large pieces of data (for example, software repositories), where nodes
could work on the repositories they already processed (and have some files
already downloaded) and avoid downloading unnecessary repository contents
if another node already has them. This could be enabled by allowing nodes
to check their internal state and decide if they want to accept/reject a
certain repository as a job. I know that the "more complicated" example
might be a far fetch, but I wanted to give you more context on what I'd
want to know about Beam.

Thanks for all the insights!

Best,
Ana

[1]
https://github.com/crossflowlabs/crossflow/tree/master/org.crossflow.tests/src/org/crossflow/tests/opinionated


On Tue, 7 Sept 2021 at 13:57, Jan Lukavský <[email protected]> wrote:

> Hi Ana,
>
> in general, worker nodes do not share any state, and cannot themselves
> decide which work to accept and which to reject. How the work is
> distributed to downstream processing is defined by a runner, not the Beam
> model. On the other hand, what you ask for might be possibly accomplished
> using a grouping operation - either a GroupByKey or a stateful DoFn might
> help you with that. Can you further describe your intent?
>
> Best,
>
>  Jan
> On 9/7/21 12:32 PM, Ana Markovic wrote:
>
> To whom this may concern,
>
> I've been looking into polyglot data processing frameworks recently, and I
> read Beam's documentation as well as developed a few examples to get some
> hands-on experience. I've been wondering, and I haven't found this in the
> documentation, is there a way to set up worker nodes so they are
> "opinionated" or "smart" in a sense that they can decide for themselves
> which jobs they will perform? For example, in a word count example, an
> opinionated worker node could only decide to monitor occurrences of a
> specific word if it's among the node's favourite words.
>
> I hope I explained it well, but please let me know if more details are
> needed to answer this question.
>
> Thankful in advance,
> Ana
>
> --
Best,
Ana

Reply via email to