Hi Everyone,

I have raised a bug on GCP for this.
But..am I the only one trying to write from Beam to BigTable in Python ?
Is that a warning sign showing that this combo is not mature ?
Is there any attempt using the Java connector in Python ?

Glad to hear about your experience and advice - and of course about other
ideas to solve this "bug".

Thanks !

Le mer. 13 oct. 2021 à 18:14, Pierre Oberholzer <[email protected]>
a écrit :

> Hi Brian,
>
> Yes I do execute a run() at the end, and I see the Dataflow completing on
> the GUI (link <https://console.cloud.google.com/dataflow/jobs>). Thanks
> for asking ;)
> Is there maybe a commit () missing as referred to here
> <https://googleapis.dev/python/bigtable/latest/row.html#google.cloud.bigtable.row.DirectRow>,
> and if yes, where to put it in the pipeline ?
>
> Le mer. 13 oct. 2021 à 18:08, Brian Hulette <[email protected]> a
> écrit :
>
>> Hey Pierre,
>> Sorry for the silly question but I have to ask - are you actually running
>> the pipeline? In your initial snippet you created the pipeline in a context
>> (with beam.Pipeline() as p:), which will run the pipeline when you exit.
>> But your latest snippet doesn't show the context, or a call to p.run(). Are
>> they missing, or just not shown?
>>
>> Otherwise I don't see anything obviously wrong with your code. You might
>> try contacting GCP support, since you're working with two GCP products.
>>
>> Brian
>>
>> On Tue, Oct 12, 2021 at 10:22 PM Pierre Oberholzer <
>> [email protected]> wrote:
>>
>>> Dear Community,
>>>
>>> Glad to get your support here !
>>> Issue: empty BigTable when using the Python/Beam connector.
>>>
>>> Thanks !
>>>
>>> Le dim. 10 oct. 2021 à 14:34, Pierre Oberholzer <
>>> [email protected]> a écrit :
>>>
>>>> Thanks Israel, this helped. No error anymore, but the table remains
>>>> empty with this code
>>>> <https://stackoverflow.com/questions/63035772/streaming-pipeline-in-dataflow-to-bigtable-python>
>>>> .
>>>>
>>>> *Code*
>>>>
>>>> class CreateRowFn(beam.DoFn):
>>>>
>>>>     def process(self, key):
>>>>         direct_row = row.DirectRow(row_key=key)
>>>>         direct_row.set_cell(
>>>>             "stats_summary",
>>>>             b"os_build",
>>>>             b"android",
>>>>             datetime.datetime.now())
>>>>         return [direct_row]
>>>>
>>>> _ = (p
>>>>                 |
>>>> beam.Create(["phone#4c410523#20190501","phone#4c410523#20190502"])
>>>>                 | beam.ParDo(CreateRowFn())
>>>>                 |
>>>> WriteToBigTable(project_id=pipeline_options.bigtable_project,
>>>>
>>>> instance_id=pipeline_options.bigtable_instance,
>>>>
>>>> table_id=pipeline_options.bigtable_table)
>>>> *Issue*
>>>>
>>>> Empty table
>>>> (checked with happybase and check = [(key,row) for key, row in
>>>> table.scan()])
>>>>
>>>> Thanks !
>>>>
>>>> Le sam. 9 oct. 2021 à 21:37, Israel Herraiz <[email protected]> a écrit :
>>>>
>>>>> You have to write DirectRows to Bigtable, not strings. For more info,
>>>>> please see
>>>>> https://googleapis.dev/python/bigtable/latest/row.html#google.cloud.bigtable.row.DirectRow
>>>>>
>>>>
>>>>
>>>> --
>>>> Pierre
>>>>
>>>
>>>
>>> --
>>> Pierre
>>>
>>
>
> --
> Pierre
>


-- 
Pierre

Reply via email to