Hi Everyone, I have raised a bug on GCP for this. But..am I the only one trying to write from Beam to BigTable in Python ? Is that a warning sign showing that this combo is not mature ? Is there any attempt using the Java connector in Python ?
Glad to hear about your experience and advice - and of course about other ideas to solve this "bug". Thanks ! Le mer. 13 oct. 2021 à 18:14, Pierre Oberholzer <[email protected]> a écrit : > Hi Brian, > > Yes I do execute a run() at the end, and I see the Dataflow completing on > the GUI (link <https://console.cloud.google.com/dataflow/jobs>). Thanks > for asking ;) > Is there maybe a commit () missing as referred to here > <https://googleapis.dev/python/bigtable/latest/row.html#google.cloud.bigtable.row.DirectRow>, > and if yes, where to put it in the pipeline ? > > Le mer. 13 oct. 2021 à 18:08, Brian Hulette <[email protected]> a > écrit : > >> Hey Pierre, >> Sorry for the silly question but I have to ask - are you actually running >> the pipeline? In your initial snippet you created the pipeline in a context >> (with beam.Pipeline() as p:), which will run the pipeline when you exit. >> But your latest snippet doesn't show the context, or a call to p.run(). Are >> they missing, or just not shown? >> >> Otherwise I don't see anything obviously wrong with your code. You might >> try contacting GCP support, since you're working with two GCP products. >> >> Brian >> >> On Tue, Oct 12, 2021 at 10:22 PM Pierre Oberholzer < >> [email protected]> wrote: >> >>> Dear Community, >>> >>> Glad to get your support here ! >>> Issue: empty BigTable when using the Python/Beam connector. >>> >>> Thanks ! >>> >>> Le dim. 10 oct. 2021 à 14:34, Pierre Oberholzer < >>> [email protected]> a écrit : >>> >>>> Thanks Israel, this helped. No error anymore, but the table remains >>>> empty with this code >>>> <https://stackoverflow.com/questions/63035772/streaming-pipeline-in-dataflow-to-bigtable-python> >>>> . >>>> >>>> *Code* >>>> >>>> class CreateRowFn(beam.DoFn): >>>> >>>> def process(self, key): >>>> direct_row = row.DirectRow(row_key=key) >>>> direct_row.set_cell( >>>> "stats_summary", >>>> b"os_build", >>>> b"android", >>>> datetime.datetime.now()) >>>> return [direct_row] >>>> >>>> _ = (p >>>> | >>>> beam.Create(["phone#4c410523#20190501","phone#4c410523#20190502"]) >>>> | beam.ParDo(CreateRowFn()) >>>> | >>>> WriteToBigTable(project_id=pipeline_options.bigtable_project, >>>> >>>> instance_id=pipeline_options.bigtable_instance, >>>> >>>> table_id=pipeline_options.bigtable_table) >>>> *Issue* >>>> >>>> Empty table >>>> (checked with happybase and check = [(key,row) for key, row in >>>> table.scan()]) >>>> >>>> Thanks ! >>>> >>>> Le sam. 9 oct. 2021 à 21:37, Israel Herraiz <[email protected]> a écrit : >>>> >>>>> You have to write DirectRows to Bigtable, not strings. For more info, >>>>> please see >>>>> https://googleapis.dev/python/bigtable/latest/row.html#google.cloud.bigtable.row.DirectRow >>>>> >>>> >>>> >>>> -- >>>> Pierre >>>> >>> >>> >>> -- >>> Pierre >>> >> > > -- > Pierre > -- Pierre
