Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
It's indeed the first Logical identifier with Row base type. The UUID is generated from the name of the class, but doing it in code (from a string) you need to create bytes from the string, then a UUID. _/ _/ Alex Van Boxel On Mon, Jan 13, 2020 at 10:40 PM Brian Hulette wrote: > I guess

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Brian Hulette
I guess these are the first logical types we've defined with a base type of row. It does seem reasonable that a static schema for a logical type could have some fixed id, but it feels odd to have a fixed UUID, it would be nice if we could give the schema some meaningful static identifier. I think

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
Fix in this PR: [BEAM-9113] Fix serialization proto logical types https://github.com/apache/beam/pull/10569 or we all agree to *promote* the logical types to top-level logical types (as described in the design document, see ticket): [BEAM-9037] Instant and duration as logical type

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
So I think the following happens: 1. the schema tree is initialized at construction time. The tree get serialized and send to the workers 2. the workers deserialize the tree, but as the Timestamp logical type have a logical type with a *static* schema the schema will be

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Reuven Lax
SchemaCoder today recursively sets UUIDs for all schemas, including logical types, in setSchemaIds. Is it possible that your changes modified that logic somehow? On Mon, Jan 13, 2020 at 9:39 AM Alex Van Boxel wrote: > This is the stacktrace: > > > java.lang.IllegalStateException at >

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
This is the stacktrace: java.lang.IllegalStateException at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:491) at org.apache.beam.sdk.coders.RowCoderGenerator.getCoder(RowCoderGenerator.java:380) at

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Reuven Lax
I don't think that should be the case. Also SchemaCoder will automatically set the UUID for such logical types. On Mon, Jan 13, 2020 at 8:24 AM Alex Van Boxel wrote: > OK, I've rechecked everything and eventually found the problem. The > problem is when you use a LogicalType backed back a Row,

Re: master on Dataflow with schema aware PCollections stuck

2020-01-13 Thread Alex Van Boxel
OK, I've rechecked everything and eventually found the problem. The problem is when you use a LogicalType backed back a Row, then the UUID needs to be set to make it work. (this is the case for Proto based Timestamps). I'll create a fix. _/ _/ Alex Van Boxel On Mon, Jan 13, 2020 at 8:36 AM

Re: master on Dataflow with schema aware PCollections stuck

2020-01-12 Thread Reuven Lax
Can you elucidate? All BeamSQL pipelines use schemas and I believe those test are working just fine on the Dataflow runner. In addition, there are a number of ValidatesRunner schema-aware pipelines that are running regularly on the Dataflow runner. On Sun, Jan 12, 2020 at 1:43 AM Alex Van Boxel

Re: master on Dataflow with schema aware PCollections stuck

2020-01-12 Thread Alex Van Boxel
BTW. This is not a support ticket, I'm wondering if we are aware and we're missing schema aware integration tests as well. _/ _/ Alex Van Boxel On Sun, Jan 12, 2020 at 10:43 AM Alex Van Boxel wrote: > Hey all, > > anyone tried master with a *schema aware pipeline* on Dataflow? I'm > testing

master on Dataflow with schema aware PCollections stuck

2020-01-12 Thread Alex Van Boxel
Hey all, anyone tried master with a *schema aware pipeline* on Dataflow? I'm testing some PR's to see if the run on Dataflow (as they are working on Direct) but they got: Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. You