Hello, thanks for the response.

Not quite. The PCollections hold Python Dicts, so they look like this:

Suppose the final BQ table must have the columns a, b, c, d, e

PCollection1 {'a': 1, 'b': 2, 'c':3}
                     {'a': 1, 'b': nan, 'c':3}

PCollection2 {'b': 3, 'd': 10, 'e': nan, 'c': 6}
                     {'b': nan, 'd': 10, 'e':4, 'c': 6}

This happens because I'm simultaneously applying ParDo to a PCol generated
by BigQuerySource, which creates these keys based on data from a table. So
each of these ParDo transforms will create a different number of
keys(future columns of a BQ table), and potentially the same keys like
shown in the example above. Now, the question is: how can I create a PCol
derived from those, which can be written to BQ? Something like:

PCollection_final {'a': 1, 'b': 2, 'c':3, 'd': nan, 'e': nan}
                            {'a': 1, 'b': nan, 'c':3, 'd': nan, 'e': nan}
                            {'b': 3, 'd': 10, 'e': nan, 'c': 6, 'a': nan}
                            {'b': nan, 'd': 10, 'e':4, 'c': 6, 'a': nan}

Is it possible to do something like this without explicitly creating keys
with no values on the transforms that don't have rules for the creation of
them, for example by assigning NaN to columns 'd' and 'e' in PCollection1?

Em ter., 11 de fev. de 2020 às 16:23, Heejong Lee <[email protected]>
escreveu:

> What do you mean by "PCollection of dicts, each having different key
> values"? What's the type of the PCollections? I assume that you want to
> merge two PCollections of KV such as
> PCollection[("a", 1), ("b", 2), ("c", 3)] + PCollection[("a", 4), ("d",
> 5), ("e", 6)]. Is that correct?
>
> On Tue, Feb 11, 2020 at 9:19 AM Douglas Martins <
> [email protected]> wrote:
>
>> Hi,
>>
>> I am developing a Pipeline thats reads from and writes to BigQuery. At a
>> certain point, I have two or more PCollections of dicts, each having
>> different key values. How can I create a single PCollection from those,
>> that can be written to a BigQuery table? The Flatten transform doesn't work
>> because each element of the PCol ends up having different keys. Thanks!
>>
>

Reply via email to