I'd need some more info to really understand what's going on (logs, a stacktrace, etc.). Or if you have some repro code that would be great
On Fri, Oct 4, 2024 at 12:38 AM hsy...@gmail.com <hsy...@gmail.com> wrote: > Yeah, I found those failed job but none of them records why it fail and > `bq show` gives me job not found > > On Thu, Oct 3, 2024 at 2:33 PM Ahmed Abualsaud <ahmedabuals...@google.com> > wrote: > >> I'd check your Dataflow worker logs and look for any messages about >> `beam_bq_job_COPY` >> >> On Fri, Oct 4, 2024 at 12:31 AM hsy...@gmail.com <hsy...@gmail.com> >> wrote: >> >>> And interestingly in bigquery UI I only see beam_bq_job_LOAD not >>> beam_bq_job_COPY, but the job id did show up in logs >>> >>> On Thu, Oct 3, 2024 at 2:28 PM hsy...@gmail.com <hsy...@gmail.com> >>> wrote: >>> >>>> Yes I figured out the above from reading source code again. I hope the >>>> steps can be documented somewhere in beam >>>> But I still can not find the details for those jobs >>>> For example >>>> bq show -j --format=prettyjson --project_id=.... beam_bq_job_COPY_ >>>> gives me >>>> BigQuery error in show operation: Not found: Job project-data >>>> >>>> On Thu, Oct 3, 2024 at 2:17 PM Ahmed Abualsaud via user < >>>> user@beam.apache.org> wrote: >>>> >>>>> For small/medium writes, it should load directly to the table. >>>>> >>>>> For larger writes (your case), it writes to multiple temp tables then >>>>> performs a single copy job [1] that copies their contents to the final >>>>> table. Afterwards, the sink will clean up all those temp tables. >>>>> My guess is your pipeline is failing at the copy step. Note what >>>>> Reuven said in the other thread that Dataflow will retry "indefinitely for >>>>> streaming", so your pipeline will continue running. You should be able to >>>>> see error messages in your logs though. >>>>> >>>>> As to why it's failing, we'd have to know more about your use case or >>>>> see a stack trace. With these things, it's best to submit a support ticket >>>>> so the engineers can investigate. From my experience though, jobs failing >>>>> at the copy step are usually because of trying to copy partitioned >>>>> columns. >>>>> That isn't supported by BigQuery (see copy job limitations [2] >>>>> >>>>> [1] https://cloud.google.com/bigquery/docs/managing-tables#copy-table >>>>> [2] >>>>> https://cloud.google.com/bigquery/docs/managing-tables#limitations_on_copying_tables >>>>> >>>>> On Thu, Oct 3, 2024 at 11:56 PM hsy...@gmail.com <hsy...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey guys, >>>>>> >>>>>> Any help is appreciated. I'm using BigqueryIO file upload method to >>>>>> load data to BQ, I don't see any error, any warning but I also don't see >>>>>> a >>>>>> SINGLE row inserted to the table either >>>>>> >>>>>> Only thing I see is hundreds of load job like >>>>>> beam_bq_job_TEMP_TABLE_LOAD_..... >>>>>> And hundreds of temp table created >>>>>> >>>>>> Most jobs are done and I can see the data in temp table, but there is >>>>>> not a single row written to the final destination? >>>>>> >>>>>> I know there is no way to track row level error, but At least the >>>>>> runner/beam api should give me some hint what is wrong in any steps? And >>>>>> there is zero document/example about this either. >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>>