Re: File_Upload even works? and how do I even debug ?

Ahmed Abualsaud via user Thu, 03 Oct 2024 14:46:04 -0700

I'd need some more info to really understand what's going on (logs, a
stacktrace, etc.). Or if you have some repro code that would be great


On Fri, Oct 4, 2024 at 12:38 AM hsy...@gmail.com <hsy...@gmail.com> wrote:

> Yeah, I found those failed job but none of them records why it fail and
> `bq show` gives me job not found
>
> On Thu, Oct 3, 2024 at 2:33 PM Ahmed Abualsaud <ahmedabuals...@google.com>
> wrote:
>
>> I'd check your Dataflow worker logs and look for any messages about
>> `beam_bq_job_COPY`
>>
>> On Fri, Oct 4, 2024 at 12:31 AM hsy...@gmail.com <hsy...@gmail.com>
>> wrote:
>>
>>> And interestingly in bigquery UI I only see beam_bq_job_LOAD not
>>> beam_bq_job_COPY, but the job id did show up in logs
>>>
>>> On Thu, Oct 3, 2024 at 2:28 PM hsy...@gmail.com <hsy...@gmail.com>
>>> wrote:
>>>
>>>> Yes I figured out the above from reading source code again. I hope the
>>>> steps can be documented somewhere in beam
>>>> But I still can not find the details for those jobs
>>>> For example
>>>> bq show -j --format=prettyjson --project_id=.... beam_bq_job_COPY_
>>>> gives me
>>>> BigQuery error in show operation: Not found: Job project-data
>>>>
>>>> On Thu, Oct 3, 2024 at 2:17 PM Ahmed Abualsaud via user <
>>>> user@beam.apache.org> wrote:
>>>>
>>>>> For small/medium writes, it should load directly to the table.
>>>>>
>>>>> For larger writes (your case), it writes to multiple temp tables then
>>>>> performs a single copy job [1] that copies their contents to the final
>>>>> table. Afterwards, the sink will clean up all those temp tables.
>>>>> My guess is your pipeline is failing at the copy step. Note what
>>>>> Reuven said in the other thread that Dataflow will retry "indefinitely for
>>>>> streaming", so your pipeline will continue running. You should be able to
>>>>> see error messages in your logs though.
>>>>>
>>>>> As to why it's failing, we'd have to know more about your use case or
>>>>> see a stack trace. With these things, it's best to submit a support ticket
>>>>> so the engineers can investigate. From my experience though, jobs failing
>>>>> at the copy step are usually because of trying to copy partitioned 
>>>>> columns.
>>>>> That isn't supported by BigQuery (see copy job limitations [2]
>>>>>
>>>>> [1] https://cloud.google.com/bigquery/docs/managing-tables#copy-table
>>>>> [2]
>>>>> https://cloud.google.com/bigquery/docs/managing-tables#limitations_on_copying_tables
>>>>>
>>>>> On Thu, Oct 3, 2024 at 11:56 PM hsy...@gmail.com <hsy...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey guys,
>>>>>>
>>>>>> Any help is appreciated. I'm using BigqueryIO file upload method to
>>>>>> load data to BQ, I don't see any error, any warning but I also don't see 
>>>>>> a
>>>>>> SINGLE row inserted to the table either
>>>>>>
>>>>>> Only thing I see is hundreds of load job like
>>>>>> beam_bq_job_TEMP_TABLE_LOAD_.....
>>>>>> And hundreds of temp table created
>>>>>>
>>>>>> Most jobs are done and I can see the data in temp table, but there is
>>>>>> not a single row written to the final destination?
>>>>>>
>>>>>> I know there is no way to track row level error, but At least the
>>>>>> runner/beam api should give me some hint what is wrong in any steps? And
>>>>>> there is zero document/example about this either.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>

Re: File_Upload even works? and how do I even debug ?

Reply via email to