Yes schema is known at graph construction time, but you wrapped all type
information in array, but then I have to get the information and
instantiate the corresponding empty array in java

On Tue, Oct 1, 2024 at 8:46 AM hsy...@gmail.com <hsy...@gmail.com> wrote:

> Well, I'm trying to build something as cost effective as possible. I was
> trying to use row to tablerow and use the writeTableRow function, but it's
> too expensive. From the profiler, it seems row to tablerow is expensive,
> But from the source code I also see it's possible to write beam row
> directly to Bigquery
>
> Do you guys have any suggestions? I can try to use writeProto but then I
> don't get the benefit of all the buildin transformations that designed for
> beam row format
>
> On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <user@beam.apache.org>
> wrote:
>
>> Can you explain what you are trying to do here? BigQuery requires schema
>> to be known before we write. Beam schemas similarly must be known at graph
>> construction time - though this isn't quite the same as Java compile time.
>>
>> Reuven
>>
>> On Tue, Oct 1, 2024 at 12:44 AM hsy...@gmail.com <hsy...@gmail.com>
>> wrote:
>>
>>> I mean how do I create empty list if the element type is unknown at
>>> compile time.
>>>
>>> On Tue, Oct 1, 2024 at 12:42 AM hsy...@gmail.com <hsy...@gmail.com>
>>> wrote:
>>>
>>>> Thanks @Ahmed Abualsaud <ahmedabuals...@google.com>  but how do I get
>>>> around this error for now if I want to use beam schema?
>>>>
>>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user <
>>>> user@beam.apache.org> wrote:
>>>>
>>>>> Hey Siyuan,
>>>>>
>>>>> We use the descriptor because it is derived from the BQ table's schema
>>>>> In a previous step [1]. We are essentially checking against the table
>>>>> schema.
>>>>> You're seeing this error because *nullable* and *repeated* modes are
>>>>> mutually exclusive. I think we can reduce friction though by defaulting
>>>>> null values to an empty list, which seems to be in line with GoogleSQL's
>>>>> behavior [2].
>>>>>
>>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604.
>>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0
>>>>>
>>>>> For now, you can work around this by converting your null array values
>>>>> to empty lists.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67
>>>>> [2]
>>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls
>>>>>
>>>>> On Mon, Sep 30, 2024 at 6:57 PM hsy...@gmail.com <hsy...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm trying to write Beam row directly to bigquery because it would go
>>>>>> through less conversion and more efficient but there is some weird error
>>>>>> happening
>>>>>> A nullable array field would throw
>>>>>>
>>>>>> Caused by: java.lang.IllegalArgumentException: Received null value
>>>>>> for non-nullable field
>>>>>>
>>>>>> If I set null for that field
>>>>>>
>>>>>> Here is code in beam I found related
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277
>>>>>>
>>>>>>   private static Object messageValueFromRowValue(
>>>>>>       FieldDescriptor fieldDescriptor, Field beamField, int index,
>>>>>> Row row) {
>>>>>>     @Nullable Object value = row.getValue(index);
>>>>>>     if (value == null) {
>>>>>>       if (fieldDescriptor.isOptional()) {
>>>>>>         return null;
>>>>>>       } else {
>>>>>>         throw new IllegalArgumentException(
>>>>>>             "Received null value for non-nullable field " +
>>>>>> fieldDescriptor.getName());
>>>>>>       }
>>>>>>     }
>>>>>>     return toProtoValue(fieldDescriptor, beamField.getType(), value);
>>>>>>   }
>>>>>>
>>>>>> line 277 why not use beamField.isNullable() instead of
>>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should stick 
>>>>>> to
>>>>>> nullable setting on beam schema field, correct?
>>>>>>
>>>>>> And how do I avoid this?
>>>>>>
>>>>>> Regards,
>>>>>> Siyuan
>>>>>>
>>>>>

Reply via email to