BQ sink (batch) writes data to GCS and loads data to BQ using load jobs.

Regarding writing to sink, you basically have to write a PCollection of
dictionaries where each dictionary maps to a row in BQ table.

>From a ParDo or a FlatMap you have to return an iterator of records. So you
either have to return a list of records or use yield where each record is a
dictionary.

See following example that reads from and writes to BigQuery for more
clarity on the syntax.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py

Thanks,
Cham


On Mon, Aug 13, 2018 at 8:51 PM Damien Hawes <[email protected]> wrote:

> Hi Eila,
>
> To my knowledge the BigQuerySink makes use of BigQuery's streaming insert
> functionality. This means that if your data is successfully written to
> BigQuery it will not be immediately previewable (as you already know), but
> it should be immediately queryable. If you look at the table details, you
> should see records in the streaming buffer.
>
> Kind Regards,
>
> Damien
>
> On Mon, 13 Aug 2018, 20:00 OrielResearch Eila Arich-Landkof, <
> [email protected]> wrote:
>
>> [the previous email was send too early by mistake]
>> update:
>>
>> I tried the following options:
>>
>> 1. return dict from DnFn and error was fired:
>>     newRowDictlist = newRowDict  #[newRowDict]
>>     return(newRowDictlist)
>> and the following warning:
>>
>> *Returning a dict from a ParDo or FlatMap is discouraged. Please use 
>> list("...*
>>
>>
>> 2. return list with dict in it
>>     newRowDictlist = [newRowDict]
>>     return(newRowDictlist)
>>
>> No error was generated. I see the table but the data hasn't been
>> populated yet. BQ normal delay as far as I know
>>
>> Since I can not see the BQ full result....could you please let me know if
>> I am writing the data at the right format to BQ (I had no issues writing it
>> to other type of outputs)
>>
>> Thanks for any help,
>> Eila
>>
>> On Mon, Aug 13, 2018 at 1:55 PM, OrielResearch Eila Arich-Landkof <
>> [email protected]> wrote:
>>
>>> update:
>>>
>>> I tried the following options:
>>>
>>> 1. return dict from DnFn and error was fired:
>>>     newRowDictlist = newRowDict  #[newRowDict]
>>>     return(newRowDictlist)
>>>
>>> 2. return list with dict in it
>>> newRowDictlist = [newRowDict]
>>>     return(newRowDictlist)
>>>
>>>
>>>
>>> On Mon, Aug 13, 2018 at 12:51 PM, OrielResearch Eila Arich-Landkof <
>>> [email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am generating a data to be written in new BQ table with a specific
>>>> schema. The data is generated at DoFn function.
>>>>
>>>> My question is: what is the recommended format of data that I should
>>>> return from DnFn (getValuesStrFn bellow) ? is it dictionary? list?
>>>> other?
>>>> I tried list and str and it fired an error.
>>>>
>>>>
>>>> The pipeline is:
>>>> p =  beam.Pipeline(options=options)
>>>> (p | 'Read From Data Frame' >>
>>>> beam.Create(cellLinesTable.values.tolist())
>>>>    | 'call Get Value Str'  >> beam.ParDo(getValuesStrFn(colList))
>>>>    | 'write to BQ' >> 
>>>> beam.io.Write(beam.io.BigQuerySink(dataset='dataset_cell_lines',table='cell_lines_table',
>>>> schema=schema_bq)))
>>>> Thanks,
>>>> --
>>>> Eila
>>>> www.orielresearch.org
>>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>>> m/Deep-Learning-In-Production/
>>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Eila
>>> www.orielresearch.org
>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>> m/Deep-Learning-In-Production/
>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>
>>>
>>>
>>
>>
>> --
>> Eila
>> www.orielresearch.org
>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>> m/Deep-Learning-In-Production/
>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>>
>>

Reply via email to