Well, I'm trying to build something as cost effective as possible. I was trying to use row to tablerow and use the writeTableRow function, but it's too expensive. From the profiler, it seems row to tablerow is expensive, But from the source code I also see it's possible to write beam row directly to Bigquery
Do you guys have any suggestions? I can try to use writeProto but then I don't get the benefit of all the buildin transformations that designed for beam row format On Tue, Oct 1, 2024 at 8:13 AM Reuven Lax via user <user@beam.apache.org> wrote: > Can you explain what you are trying to do here? BigQuery requires schema > to be known before we write. Beam schemas similarly must be known at graph > construction time - though this isn't quite the same as Java compile time. > > Reuven > > On Tue, Oct 1, 2024 at 12:44 AM hsy...@gmail.com <hsy...@gmail.com> wrote: > >> I mean how do I create empty list if the element type is unknown at >> compile time. >> >> On Tue, Oct 1, 2024 at 12:42 AM hsy...@gmail.com <hsy...@gmail.com> >> wrote: >> >>> Thanks @Ahmed Abualsaud <ahmedabuals...@google.com> but how do I get >>> around this error for now if I want to use beam schema? >>> >>> On Mon, Sep 30, 2024 at 4:31 PM Ahmed Abualsaud via user < >>> user@beam.apache.org> wrote: >>> >>>> Hey Siyuan, >>>> >>>> We use the descriptor because it is derived from the BQ table's schema >>>> In a previous step [1]. We are essentially checking against the table >>>> schema. >>>> You're seeing this error because *nullable* and *repeated* modes are >>>> mutually exclusive. I think we can reduce friction though by defaulting >>>> null values to an empty list, which seems to be in line with GoogleSQL's >>>> behavior [2]. >>>> >>>> Opened a PR for this: https://github.com/apache/beam/pull/32604. >>>> Hopefully we can get this in for the upcoming Beam version 2.60.0 >>>> >>>> For now, you can work around this by converting your null array values >>>> to empty lists. >>>> >>>> [1] >>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiDynamicDestinationsBeamRow.java#L66-L67 >>>> [2] >>>> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_nulls >>>> >>>> On Mon, Sep 30, 2024 at 6:57 PM hsy...@gmail.com <hsy...@gmail.com> >>>> wrote: >>>> >>>>> I'm trying to write Beam row directly to bigquery because it would go >>>>> through less conversion and more efficient but there is some weird error >>>>> happening >>>>> A nullable array field would throw >>>>> >>>>> Caused by: java.lang.IllegalArgumentException: Received null value for >>>>> non-nullable field >>>>> >>>>> If I set null for that field >>>>> >>>>> Here is code in beam I found related >>>>> >>>>> >>>>> https://github.com/apache/beam/blob/111f4c34ab2efd166de732c32d99ff615abf6064/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BeamRowToStorageApiProto.java#L277 >>>>> >>>>> private static Object messageValueFromRowValue( >>>>> FieldDescriptor fieldDescriptor, Field beamField, int index, Row >>>>> row) { >>>>> @Nullable Object value = row.getValue(index); >>>>> if (value == null) { >>>>> if (fieldDescriptor.isOptional()) { >>>>> return null; >>>>> } else { >>>>> throw new IllegalArgumentException( >>>>> "Received null value for non-nullable field " + >>>>> fieldDescriptor.getName()); >>>>> } >>>>> } >>>>> return toProtoValue(fieldDescriptor, beamField.getType(), value); >>>>> } >>>>> >>>>> line 277 why not use beamField.isNullable() instead of >>>>> fieldDescriptior.isOptional() It it's useing beam schema it should stick >>>>> to >>>>> nullable setting on beam schema field, correct? >>>>> >>>>> And how do I avoid this? >>>>> >>>>> Regards, >>>>> Siyuan >>>>> >>>>