Yes schema is known at graph construction time, but you wrapped all type information in array, but then I have to get the information and instantiate the corresponding empty array in java
On Tue, Oct 1, 2024 at 8:46 AM hsy...@gmail.com <hsy...@gmail.com> wrote: > Well, I'm trying to build something as cost effective as possible. I was > trying to use row to tablerow and use the writeTableRow function, but it's > too expensive. From the profiler, it seems row to tablerow is expensive, > But from the source code I also see it's possible to write beam row > directly to Bigquery > > Do you guys have any suggestions? I can try to use writeProto but then I > don't get the benefit of all the buildin transformations that designed for > beam row format > > On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <user@beam.apache.org> > wrote: > >> Can you explain what you are trying to do here? BigQuery requires schema >> to be known before we write. Beam schemas similarly must be known at graph >> construction time - though this isn't quite the same as Java compile time. >> >> Reuven >> >> On Tue, Oct 1, 2024 at 12:44 AM hsy...@gmail.com <hsy...@gmail.com> >> wrote: >> >>> I mean how do I create empty list if the element type is unknown at >>> compile time. >>> >>> On Tue, Oct 1, 2024 at 12:42 AM hsy...@gmail.com <hsy...@gmail.com> >>> wrote: >>> >>>> Thanks @Ahmed Abualsaud <ahmedabuals...@google.com> but how do I get >>>> around this error for now if I want to use beam schema? >>>> >>>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user < >>>> user@beam.apache.org> wrote: >>>> >>>>> Hey Siyuan, >>>>> >>>>> We use the descriptor because it is derived from the BQ table's schema >>>>> In a previous step [1]. We are essentially checking against the table >>>>> schema. >>>>> You're seeing this error because *nullable* and *repeated* modes are >>>>> mutually exclusive. I think we can reduce friction though by defaulting >>>>> null values to an empty list, which seems to be in line with GoogleSQL's >>>>> behavior [2]. >>>>> >>>>> Opened a PR for this: https://github.com/apache/beam/pull/32604. >>>>> Hopefully we can get this in for the upcoming Beam version 2.60.0 >>>>> >>>>> For now, you can work around this by converting your null array values >>>>> to empty lists. >>>>> >>>>> [1] >>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67 >>>>> [2] >>>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls >>>>> >>>>> On Mon, Sep 30, 2024 at 6:57 PM hsy...@gmail.com <hsy...@gmail.com> >>>>> wrote: >>>>> >>>>>> I'm trying to write Beam row directly to bigquery because it would go >>>>>> through less conversion and more efficient but there is some weird error >>>>>> happening >>>>>> A nullable array field would throw >>>>>> >>>>>> Caused by: java.lang.IllegalArgumentException: Received null value >>>>>> for non-nullable field >>>>>> >>>>>> If I set null for that field >>>>>> >>>>>> Here is code in beam I found related >>>>>> >>>>>> >>>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277 >>>>>> >>>>>> private static Object messageValueFromRowValue( >>>>>> FieldDescriptor fieldDescriptor, Field beamField, int index, >>>>>> Row row) { >>>>>> @Nullable Object value = row.getValue(index); >>>>>> if (value == null) { >>>>>> if (fieldDescriptor.isOptional()) { >>>>>> return null; >>>>>> } else { >>>>>> throw new IllegalArgumentException( >>>>>> "Received null value for non-nullable field " + >>>>>> fieldDescriptor.getName()); >>>>>> } >>>>>> } >>>>>> return toProtoValue(fieldDescriptor, beamField.getType(), value); >>>>>> } >>>>>> >>>>>> line 277 why not use beamField.isNullable() instead of >>>>>> fieldDescriptior.isOptional() It it's useing beam schema it should stick >>>>>> to >>>>>> nullable setting on beam schema field, correct? >>>>>> >>>>>> And how do I avoid this? >>>>>> >>>>>> Regards, >>>>>> Siyuan >>>>>> >>>>>