Hi, I found the issue and can now write from Beam/Python to BigTable. You just need to create FIRST the column family before writing (here with cbt):
`cbt -instance test-instance createfamily test-table cf1` Confusing is that no error is thrown when the column family is not existing. There seems to be a similar issue with cbt [1]. It'd be great to correct this. Let me know if I should raise another bug. Thanks ! [1] https://issuetracker.google.com/issues/186053077 Pierre Le mer. 20 oct. 2021 à 04:18, Pierre Oberholzer <[email protected]> a écrit : > Hi Bryan, > > Thanks again for your reply last week. > I’ve raised a ticket here: > > https://issuetracker.google.com/issues/202977204 > > Is that what you mean by GCP support ? > Any idea on how reactive it is ? > Any other alternative to use meanwhile (Java I/O in Python)? > > Thanks for your support ! > > Best regards, Pierre > > ---------- Message transféré --------- > De : Pierre Oberholzer <[email protected]> > Date : sam. 16 oct. 2021 à 08:47 > Objet : Re: Beam/Python to BigTable > À : <[email protected]>, <[email protected]> > > > Hi Everyone, > > I have raised a bug on GCP for this. > But..am I the only one trying to write from Beam to BigTable in Python ? > Is that a warning sign showing that this combo is not mature ? > Is there any attempt using the Java connector in Python ? > > Glad to hear about your experience and advice - and of course about other > ideas to solve this "bug". > > Thanks ! > > Le mer. 13 oct. 2021 à 18:14, Pierre Oberholzer < > [email protected]> a écrit : > >> Hi Brian, >> >> Yes I do execute a run() at the end, and I see the Dataflow completing on >> the GUI (link <https://console.cloud.google.com/dataflow/jobs>). Thanks >> for asking ;) >> Is there maybe a commit () missing as referred to here >> <https://googleapis.dev/python/bigtable/latest/row.html#google.cloud.bigtable.row.DirectRow>, >> and if yes, where to put it in the pipeline ? >> >> Le mer. 13 oct. 2021 à 18:08, Brian Hulette <[email protected]> a >> écrit : >> >>> Hey Pierre, >>> Sorry for the silly question but I have to ask - are you actually >>> running the pipeline? In your initial snippet you created the pipeline in a >>> context (with beam.Pipeline() as p:), which will run the pipeline when you >>> exit. But your latest snippet doesn't show the context, or a call to >>> p.run(). Are they missing, or just not shown? >>> >>> Otherwise I don't see anything obviously wrong with your code. You might >>> try contacting GCP support, since you're working with two GCP products. >>> >>> Brian >>> >>> On Tue, Oct 12, 2021 at 10:22 PM Pierre Oberholzer < >>> [email protected]> wrote: >>> >>>> Dear Community, >>>> >>>> Glad to get your support here ! >>>> Issue: empty BigTable when using the Python/Beam connector. >>>> >>>> Thanks ! >>>> >>>> Le dim. 10 oct. 2021 à 14:34, Pierre Oberholzer < >>>> [email protected]> a écrit : >>>> >>>>> Thanks Israel, this helped. No error anymore, but the table remains >>>>> empty with this code >>>>> <https://stackoverflow.com/questions/63035772/streaming-pipeline-in-dataflow-to-bigtable-python> >>>>> . >>>>> >>>>> *Code* >>>>> >>>>> class CreateRowFn(beam.DoFn): >>>>> >>>>> def process(self, key): >>>>> direct_row = row.DirectRow(row_key=key) >>>>> direct_row.set_cell( >>>>> "stats_summary", >>>>> b"os_build", >>>>> b"android", >>>>> datetime.datetime.now()) >>>>> return [direct_row] >>>>> >>>>> _ = (p >>>>> | >>>>> beam.Create(["phone#4c410523#20190501","phone#4c410523#20190502"]) >>>>> | beam.ParDo(CreateRowFn()) >>>>> | >>>>> WriteToBigTable(project_id=pipeline_options.bigtable_project, >>>>> >>>>> instance_id=pipeline_options.bigtable_instance, >>>>> >>>>> table_id=pipeline_options.bigtable_table) >>>>> *Issue* >>>>> >>>>> Empty table >>>>> (checked with happybase and check = [(key,row) for key, row in >>>>> table.scan()]) >>>>> >>>>> Thanks ! >>>>> >>>>> Le sam. 9 oct. 2021 à 21:37, Israel Herraiz <[email protected]> a écrit : >>>>> >>>>>> You have to write DirectRows to Bigtable, not strings. For more info, >>>>>> please see >>>>>> https://googleapis.dev/python/bigtable/latest/row.html#google.cloud.bigtable.row.DirectRow >>>>>> >>>>> >>>>> >>>>> -- >>>>> Pierre >>>>> >>>> >>>> >>>> -- >>>> Pierre >>>> >>> >> >> -- >> Pierre >> > > > -- > Pierre > -- > Pierre > -- Pierre
