Hi Eila,

To my knowledge the BigQuerySink makes use of BigQuery's streaming insert
functionality. This means that if your data is successfully written to
BigQuery it will not be immediately previewable (as you already know), but
it should be immediately queryable. If you look at the table details, you
should see records in the streaming buffer.

Kind Regards,

Damien

On Mon, 13 Aug 2018, 20:00 OrielResearch Eila Arich-Landkof, <
[email protected]> wrote:

> [the previous email was send too early by mistake]
> update:
>
> I tried the following options:
>
> 1. return dict from DnFn and error was fired:
>     newRowDictlist = newRowDict  #[newRowDict]
>     return(newRowDictlist)
> and the following warning:
>
> *Returning a dict from a ParDo or FlatMap is discouraged. Please use 
> list("...*
>
>
> 2. return list with dict in it
>     newRowDictlist = [newRowDict]
>     return(newRowDictlist)
>
> No error was generated. I see the table but the data hasn't been populated
> yet. BQ normal delay as far as I know
>
> Since I can not see the BQ full result....could you please let me know if
> I am writing the data at the right format to BQ (I had no issues writing it
> to other type of outputs)
>
> Thanks for any help,
> Eila
>
> On Mon, Aug 13, 2018 at 1:55 PM, OrielResearch Eila Arich-Landkof <
> [email protected]> wrote:
>
>> update:
>>
>> I tried the following options:
>>
>> 1. return dict from DnFn and error was fired:
>>     newRowDictlist = newRowDict  #[newRowDict]
>>     return(newRowDictlist)
>>
>> 2. return list with dict in it
>> newRowDictlist = [newRowDict]
>>     return(newRowDictlist)
>>
>>
>>
>> On Mon, Aug 13, 2018 at 12:51 PM, OrielResearch Eila Arich-Landkof <
>> [email protected]> wrote:
>>
>>> Hello,
>>>
>>> I am generating a data to be written in new BQ table with a specific
>>> schema. The data is generated at DoFn function.
>>>
>>> My question is: what is the recommended format of data that I should
>>> return from DnFn (getValuesStrFn bellow) ? is it dictionary? list?
>>> other?
>>> I tried list and str and it fired an error.
>>>
>>>
>>> The pipeline is:
>>> p =  beam.Pipeline(options=options)
>>> (p | 'Read From Data Frame' >>
>>> beam.Create(cellLinesTable.values.tolist())
>>>    | 'call Get Value Str'  >> beam.ParDo(getValuesStrFn(colList))
>>>    | 'write to BQ' >> 
>>> beam.io.Write(beam.io.BigQuerySink(dataset='dataset_cell_lines',table='cell_lines_table',
>>> schema=schema_bq)))
>>> Thanks,
>>> --
>>> Eila
>>> www.orielresearch.org
>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>> m/Deep-Learning-In-Production/
>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>
>>>
>>>
>>
>>
>> --
>> Eila
>> www.orielresearch.org
>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>> m/Deep-Learning-In-Production/
>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>>
>>
>
>
> --
> Eila
> www.orielresearch.org
> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
> m/Deep-Learning-In-Production/
> <https://www.meetup.com/Deep-Learning-In-Production/>
>
>
>

Reply via email to