Hi Preston, sharing the Google Sheets is not enough (already tested), because the Dataflow service account is only authenticated on GCP, not Drive; moreover I am using Python SDK, not the Scala wrapper, to develop Beam pipelines. Leonardo
Da: Preston Marshall <[email protected]> Inviato: mercoledì 6 giugno 2018 21:21 A: [email protected] Oggetto: Re: Read from a Google Sheet based BigQuery table - Python SDK Not sure if this is helpful but you can also share Google Sheets with service accounts directly. I am solving a similar problem by using the Google SDK directly to pull the data from the sheet, then feeding it into Beam via Scio's parallelize functionality. My dataset is small so this worked for me. On Wed, Jun 6, 2018 at 1:13 PM Chamikara Jayalath <[email protected]<mailto:[email protected]>> wrote: On Tue, Jun 5, 2018 at 9:56 PM Leonardo Biagioli <[email protected]<mailto:[email protected]>> wrote: Hi Cham, thanks but those pages are related to the authentication inside Google Cloud Platform services, I need to authenticate the job on Sheets… Since that the required scope is https://www.googleapis.com/auth/drive is there a way to pass it in the deployment phase of a Dataflow job? I haven't tried this unfortunately so not sure if this will work. Are you able to run queries against your federated table using BQ dashboard (without using Dataflow) ? Also make sure that compute engine service account used by Dataflow job is properly authenticated (as mentioned in the document I provided). I recommend contacting Google cloud support for questions regarding BQ and Dataflow services. - Cham Thank you, Leonardo Da: Chamikara Jayalath <[email protected]<mailto:[email protected]>> Inviato: martedì 5 giugno 2018 19:26 A: [email protected]<mailto:[email protected]> Cc: [email protected]<mailto:[email protected]> Oggetto: Re: Read from a Google Sheet based BigQuery table - Python SDK See following regarding authenticating Dataflow jobs. https://cloud.google.com/dataflow/security-and-permissions I'm not sure about information specific to sheets, seems like there's some info in following. https://cloud.google.com/bigquery/external-data-drive On Tue, Jun 5, 2018 at 10:16 AM Leonardo Biagioli <[email protected]<mailto:[email protected]>> wrote: Hi Cham, Thank you for taking time to answer! Is there a way to authenticate properly a Beam job on Dataflow runner? I should specify the required scope to read from Sheets, but where I can set that parameter? Regards, Leonardo Il 05 giu 2018 18:28, Chamikara Jayalath <[email protected]<mailto:[email protected]>> ha scritto: I don't think BQ federated tables support export jobs so reading directly from such tables likely will not work. But reading using a query should work if your job is authenticated properly (I haven't tested this). - Cham On Tue, Jun 5, 2018, 5:56 AM Leonardo Biagioli <[email protected]<mailto:[email protected]>> wrote: Hi guys, just wanted to ask you if there is a chance to read from a Sheet based BigQuery table from a Beam pipeline running on Dataflow… I usually specify additional scopes to use through the authentication when running simple Python code to do the same, but I wasn’t able to find a reference to something similar for Beam. Could you please help? Thank you very much! Leonardo -- Preston Marshall Director, Data Engineering [www.cityblock.com]<http://www.cityblock.com/> 256-434-1050 [email protected]<mailto:[email protected]> 55 Washington St, Unit 552 Brooklyn, NY 11201
