Hi Matt,
is this computation running as part of a larger pipeline that does run some
parallel processing? Otherwise, it's odd that it needs to run on Beam.
Nonetheless, you can certainly do this with a pipeline that has a single
element. Here's what that looks like in python:

p | beam.Create(['gs://myfile.json']) | beam.ParDo(LoadEachFile()) |
WriteToMyDatabase()

If, on the other hand, you may have a PCollection with multiple elements
(e.g. filenames), and you want to process them one-by-one, you can group
them all on a single key, like so:

my_filename_pcolll | GroupIntoSingleThread() | beam.PArDo(LoadEachFile()) |
WriteToMyDatabase()

Where the GroupIntoSingleThread transform looks basically like so:

input | beam.Map(lambda x: ('singlekey', x)) | beam.GroupByKey() |
beam.FlatMap(lambda x: x[1])

In this example, we are adding a single key to all elements, grouping them
all together, and then throwing away the key, to get each of the elements
one-by-one in a single thread. You can do something similar using side
inputs (with AsIter(my_filename_pcoll)).

Does that help? Or perhaps you could clarify a bit more about your use case.
Best
-P.

On Mon, Jan 7, 2019 at 1:33 PM Matt Casters <[email protected]> wrote:

> Hi Beam!
>
> There's a bunch of stuff that I would like to support and it's probably
> something silly but I couldn't find it immediately ... or I'm completely
> dim and making too much of certain things.
>
> The thing is, sometimes you just want to do a single threaded operations.
> For example, we sometimes need to read generic JSON or XML documents or
> perform single threaded bulk loads into certain databases.
> There's also simple relational database data you might want to side-load
> or data from some web service somewhere.
>
> So, how can I instruct Beam not to fire up a bunch of readers or writers,
> what is a good alternative for ParDo?
>
> Thanks in advance for any suggestions!
>
> Matt
> ---
> Matt Casters <m <[email protected]>[email protected]>
> Senior Solution Architect, Kettle Project Founder
>
>
>

Reply via email to