BQ sink (batch) writes data to GCS and loads data to BQ using load jobs. Regarding writing to sink, you basically have to write a PCollection of dictionaries where each dictionary maps to a row in BQ table.
>From a ParDo or a FlatMap you have to return an iterator of records. So you either have to return a list of records or use yield where each record is a dictionary. See following example that reads from and writes to BigQuery for more clarity on the syntax. https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py Thanks, Cham On Mon, Aug 13, 2018 at 8:51 PM Damien Hawes <[email protected]> wrote: > Hi Eila, > > To my knowledge the BigQuerySink makes use of BigQuery's streaming insert > functionality. This means that if your data is successfully written to > BigQuery it will not be immediately previewable (as you already know), but > it should be immediately queryable. If you look at the table details, you > should see records in the streaming buffer. > > Kind Regards, > > Damien > > On Mon, 13 Aug 2018, 20:00 OrielResearch Eila Arich-Landkof, < > [email protected]> wrote: > >> [the previous email was send too early by mistake] >> update: >> >> I tried the following options: >> >> 1. return dict from DnFn and error was fired: >> newRowDictlist = newRowDict #[newRowDict] >> return(newRowDictlist) >> and the following warning: >> >> *Returning a dict from a ParDo or FlatMap is discouraged. Please use >> list("...* >> >> >> 2. return list with dict in it >> newRowDictlist = [newRowDict] >> return(newRowDictlist) >> >> No error was generated. I see the table but the data hasn't been >> populated yet. BQ normal delay as far as I know >> >> Since I can not see the BQ full result....could you please let me know if >> I am writing the data at the right format to BQ (I had no issues writing it >> to other type of outputs) >> >> Thanks for any help, >> Eila >> >> On Mon, Aug 13, 2018 at 1:55 PM, OrielResearch Eila Arich-Landkof < >> [email protected]> wrote: >> >>> update: >>> >>> I tried the following options: >>> >>> 1. return dict from DnFn and error was fired: >>> newRowDictlist = newRowDict #[newRowDict] >>> return(newRowDictlist) >>> >>> 2. return list with dict in it >>> newRowDictlist = [newRowDict] >>> return(newRowDictlist) >>> >>> >>> >>> On Mon, Aug 13, 2018 at 12:51 PM, OrielResearch Eila Arich-Landkof < >>> [email protected]> wrote: >>> >>>> Hello, >>>> >>>> I am generating a data to be written in new BQ table with a specific >>>> schema. The data is generated at DoFn function. >>>> >>>> My question is: what is the recommended format of data that I should >>>> return from DnFn (getValuesStrFn bellow) ? is it dictionary? list? >>>> other? >>>> I tried list and str and it fired an error. >>>> >>>> >>>> The pipeline is: >>>> p = beam.Pipeline(options=options) >>>> (p | 'Read From Data Frame' >> >>>> beam.Create(cellLinesTable.values.tolist()) >>>> | 'call Get Value Str' >> beam.ParDo(getValuesStrFn(colList)) >>>> | 'write to BQ' >> >>>> beam.io.Write(beam.io.BigQuerySink(dataset='dataset_cell_lines',table='cell_lines_table', >>>> schema=schema_bq))) >>>> Thanks, >>>> -- >>>> Eila >>>> www.orielresearch.org >>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/> >>>> m/Deep-Learning-In-Production/ >>>> <https://www.meetup.com/Deep-Learning-In-Production/> >>>> >>>> >>>> >>> >>> >>> -- >>> Eila >>> www.orielresearch.org >>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >>> p.co <https://www.meetup.com/Deep-Learning-In-Production/> >>> m/Deep-Learning-In-Production/ >>> <https://www.meetup.com/Deep-Learning-In-Production/> >>> >>> >>> >> >> >> -- >> Eila >> www.orielresearch.org >> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/> >> p.co <https://www.meetup.com/Deep-Learning-In-Production/> >> m/Deep-Learning-In-Production/ >> <https://www.meetup.com/Deep-Learning-In-Production/> >> >> >>
