Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Chamikara Jayalath via dev
On Tue, Nov 15, 2022 at 12:52 PM Ahmed Abualsaud 
wrote:

> Schema-aware transforms are not restricted to I/Os. An arbitrary transform
>> can be a Schema-Transform.  Also, designation Read/Write does not map to an
>> arbitrary transform. Probably we should try to make this more generic ?
>>
>
> Agreed, I suggest keeping everything on the left side of the name unique
> to the transform, so that the right side is consistently SchemaTransform
> | SchemaTransformProvider | SchemaTransformConfiguration. What do others
> think?
>

Sgtm. I don't think we should enforce class names though but it's good to
have a recommendation.


>
> Also, probably what's more important is the identifier of the
>> SchemaTransformProvider being unique.
>
> FWIW, we came up with a similar generic URN naming scheme for
>> cross-language transforms:
>> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>
>
> The URN convention in that link looks good, it may be a good idea to
> replace transform with schematransform in the URN in this case to make a
> distinction. ie.
> beam:schematransform:org.apache.beam:kafka_read_with_metadata:v1. I will
> mention this in the other thread when I go over the comments in the
> Supporting SchemaTransforms doc [1].
>

+1 for replacing "transform" with "schematransform" to prevent URN
conflicts (even though these are not exactly in the same category).

Thanks,
Cham


>
> [1]
>
>  Supporting existing connectors with SchemaTrans...
> 
>
>
> On Tue, Nov 15, 2022 at 3:41 PM John Casey via dev 
> wrote:
>
>> One distinction here is the difference between the URN for a provider /
>> transform, and the class name in Java.
>>
>> We should have a standard for both, but they are distinct
>>
>> On Tue, Nov 15, 2022 at 3:39 PM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>>
>>>
>>> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hello Everyone,

 Do we like the following Java class naming convention for
 SchemaTransformProviders [1]?  The proposal is:

 (Read|Write)SchemaTransformProvider


 *For those new to Beam, even if this is your first day, consider
 yourselves a welcome contributor to this conversation.  Below are
 definitions/references and a suggested learning guide to understand this
 email.*

 Explanation

 The  identifies the Beam I/O [2] and Read or Write identifies a
 read or write Ptransform, respectively.

>>>
>>> Schema-aware transforms are not restricted to I/Os. An arbitrary
>>> transform can be a Schema-Transform.  Also, designation Read/Write does not
>>> map to an arbitrary transform. Probably we should try to make this more
>>> generic ?
>>>
>>> Also, probably what's more important is the identifier of the
>>> SchemaTransformProvider being unique. Note the class name (the latter is
>>> guaranteed to be unique if we follow the Java package naming guidelines).
>>>
>>> FWIW, we came up with a similar generic URN naming scheme for
>>> cross-language transforms:
>>> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>>>
>>> Thanks,
>>> Cham
>>>
>>>
 For example, to implement a SchemaTransformProvider [1] for
 BigQueryIO.Write[7], would look like:

 BigQueryWriteSchemaTransformProvider


 And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
 like like:

 PubsubReadSchemaTransformProvider


 Definitions/References

 [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
 transforms using a language agnostic configuration.
 SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
 functions as the configuration of that SchemaProvider.

 https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html

 [2] *Beam I/O*: PTransform for reading from or writing to sources and
 sinks.
 https://beam.apache.org/documentation/programming-guide/#pipeline-io

 [3] *SchemaTransform*: An interface containing a buildTransform method
 that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.

 https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html

 [4] *Row*: A Beam Row is a generic element of data whose properties
 are defined by a Schema[5].

 https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html

 [5] *Schema*: A description of expected field names and their data
 types.

 https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html

 [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
 PInput or POutput tagged by a 

Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Chamikara Jayalath via dev
On Tue, Nov 15, 2022 at 1:38 PM Reuven Lax via dev 
wrote:

> Out of curiosity, several IOs (including PubSub) already do support
> schemas. Are you planning on modifying those?
>

Schema-aware Transform is an overloaded term. I think this is about the
implementations of the following.
https://docs.google.com/document/d/1B-pxOjIA8Znl99nDRFEQMfr7VG91MZGfki2BPanjjZA/edit


>
> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
> dev@beam.apache.org> wrote:
>
>> Hello Everyone,
>>
>> Do we like the following Java class naming convention for
>> SchemaTransformProviders [1]?  The proposal is:
>>
>> (Read|Write)SchemaTransformProvider
>>
>>
>> *For those new to Beam, even if this is your first day, consider
>> yourselves a welcome contributor to this conversation.  Below are
>> definitions/references and a suggested learning guide to understand this
>> email.*
>>
>> Explanation
>>
>> The  identifies the Beam I/O [2] and Read or Write identifies a
>> read or write Ptransform, respectively.
>>
>> For example, to implement a SchemaTransformProvider [1] for
>> BigQueryIO.Write[7], would look like:
>>
>> BigQueryWriteSchemaTransformProvider
>>
>>
>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>> like like:
>>
>> PubsubReadSchemaTransformProvider
>>
>>
>> Definitions/References
>>
>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>> transforms using a language agnostic configuration.
>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>> functions as the configuration of that SchemaProvider.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>
>> [2] *Beam I/O*: PTransform for reading from or writing to sources and
>> sinks.
>> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>
>> [3] *SchemaTransform*: An interface containing a buildTransform method
>> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>>
>> [4] *Row*: A Beam Row is a generic element of data whose properties are
>> defined by a Schema[5].
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>>
>> [5] *Schema*: A description of expected field names and their data types.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>>
>> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
>> PInput or POutput tagged by a String name.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>>
>> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
>> BigQuery table.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>>
>> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
>> message payloads into a PCollection.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>>
>> Suggested Learning/Reading to understand this email
>>
>> 1. https://beam.apache.org/documentation/programming-guide/#overview
>> 2. https://beam.apache.org/documentation/programming-guide/#transforms
>> (Up to 4.1)
>> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
>> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>>
>


Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Reuven Lax via dev
Out of curiosity, several IOs (including PubSub) already do support
schemas. Are you planning on modifying those?

On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev 
wrote:

> Hello Everyone,
>
> Do we like the following Java class naming convention for
> SchemaTransformProviders [1]?  The proposal is:
>
> (Read|Write)SchemaTransformProvider
>
>
> *For those new to Beam, even if this is your first day, consider
> yourselves a welcome contributor to this conversation.  Below are
> definitions/references and a suggested learning guide to understand this
> email.*
>
> Explanation
>
> The  identifies the Beam I/O [2] and Read or Write identifies a
> read or write Ptransform, respectively.
>
> For example, to implement a SchemaTransformProvider [1] for
> BigQueryIO.Write[7], would look like:
>
> BigQueryWriteSchemaTransformProvider
>
>
> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like
> like:
>
> PubsubReadSchemaTransformProvider
>
>
> Definitions/References
>
> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
> transforms using a language agnostic configuration.
> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
> functions as the configuration of that SchemaProvider.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>
> [2] *Beam I/O*: PTransform for reading from or writing to sources and
> sinks.
> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>
> [3] *SchemaTransform*: An interface containing a buildTransform method
> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>
> [4] *Row*: A Beam Row is a generic element of data whose properties are
> defined by a Schema[5].
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>
> [5] *Schema*: A description of expected field names and their data types.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>
> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
> PInput or POutput tagged by a String name.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>
> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
> BigQuery table.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>
> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
> message payloads into a PCollection.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>
> Suggested Learning/Reading to understand this email
>
> 1. https://beam.apache.org/documentation/programming-guide/#overview
> 2. https://beam.apache.org/documentation/programming-guide/#transforms
> (Up to 4.1)
> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>


Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Ahmed Abualsaud via dev
>
> Schema-aware transforms are not restricted to I/Os. An arbitrary transform
> can be a Schema-Transform.  Also, designation Read/Write does not map to an
> arbitrary transform. Probably we should try to make this more generic ?
>

Agreed, I suggest keeping everything on the left side of the name unique to
the transform, so that the right side is consistently SchemaTransform |
SchemaTransformProvider | SchemaTransformConfiguration. What do others
think?

Also, probably what's more important is the identifier of the
> SchemaTransformProvider being unique.

FWIW, we came up with a similar generic URN naming scheme for
> cross-language transforms:
> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn


The URN convention in that link looks good, it may be a good idea to
replace transform with schematransform in the URN in this case to make a
distinction. ie.
beam:schematransform:org.apache.beam:kafka_read_with_metadata:v1. I will
mention this in the other thread when I go over the comments in the
Supporting SchemaTransforms doc [1].

[1]

 Supporting existing connectors with SchemaTrans...



On Tue, Nov 15, 2022 at 3:41 PM John Casey via dev 
wrote:

> One distinction here is the difference between the URN for a provider /
> transform, and the class name in Java.
>
> We should have a standard for both, but they are distinct
>
> On Tue, Nov 15, 2022 at 3:39 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>>
>>
>> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hello Everyone,
>>>
>>> Do we like the following Java class naming convention for
>>> SchemaTransformProviders [1]?  The proposal is:
>>>
>>> (Read|Write)SchemaTransformProvider
>>>
>>>
>>> *For those new to Beam, even if this is your first day, consider
>>> yourselves a welcome contributor to this conversation.  Below are
>>> definitions/references and a suggested learning guide to understand this
>>> email.*
>>>
>>> Explanation
>>>
>>> The  identifies the Beam I/O [2] and Read or Write identifies a
>>> read or write Ptransform, respectively.
>>>
>>
>> Schema-aware transforms are not restricted to I/Os. An arbitrary
>> transform can be a Schema-Transform.  Also, designation Read/Write does not
>> map to an arbitrary transform. Probably we should try to make this more
>> generic ?
>>
>> Also, probably what's more important is the identifier of the
>> SchemaTransformProvider being unique. Note the class name (the latter is
>> guaranteed to be unique if we follow the Java package naming guidelines).
>>
>> FWIW, we came up with a similar generic URN naming scheme for
>> cross-language transforms:
>> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>>
>> Thanks,
>> Cham
>>
>>
>>> For example, to implement a SchemaTransformProvider [1] for
>>> BigQueryIO.Write[7], would look like:
>>>
>>> BigQueryWriteSchemaTransformProvider
>>>
>>>
>>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>>> like like:
>>>
>>> PubsubReadSchemaTransformProvider
>>>
>>>
>>> Definitions/References
>>>
>>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>>> transforms using a language agnostic configuration.
>>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>>> functions as the configuration of that SchemaProvider.
>>>
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>>
>>> [2] *Beam I/O*: PTransform for reading from or writing to sources and
>>> sinks.
>>> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>>
>>> [3] *SchemaTransform*: An interface containing a buildTransform method
>>> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>>>
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>>>
>>> [4] *Row*: A Beam Row is a generic element of data whose properties are
>>> defined by a Schema[5].
>>>
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>>>
>>> [5] *Schema*: A description of expected field names and their data
>>> types.
>>>
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>>>
>>> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
>>> PInput or POutput tagged by a String name.
>>>
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>>>
>>> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
>>> BigQuery table.
>>>
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>>>
>>> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
>>> message payloads into a PCollection.
>>>

Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread John Casey via dev
One distinction here is the difference between the URN for a provider /
transform, and the class name in Java.

We should have a standard for both, but they are distinct

On Tue, Nov 15, 2022 at 3:39 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

>
>
> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
> dev@beam.apache.org> wrote:
>
>> Hello Everyone,
>>
>> Do we like the following Java class naming convention for
>> SchemaTransformProviders [1]?  The proposal is:
>>
>> (Read|Write)SchemaTransformProvider
>>
>>
>> *For those new to Beam, even if this is your first day, consider
>> yourselves a welcome contributor to this conversation.  Below are
>> definitions/references and a suggested learning guide to understand this
>> email.*
>>
>> Explanation
>>
>> The  identifies the Beam I/O [2] and Read or Write identifies a
>> read or write Ptransform, respectively.
>>
>
> Schema-aware transforms are not restricted to I/Os. An arbitrary transform
> can be a Schema-Transform.  Also, designation Read/Write does not map to an
> arbitrary transform. Probably we should try to make this more generic ?
>
> Also, probably what's more important is the identifier of the
> SchemaTransformProvider being unique. Note the class name (the latter is
> guaranteed to be unique if we follow the Java package naming guidelines).
>
> FWIW, we came up with a similar generic URN naming scheme for
> cross-language transforms:
> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>
> Thanks,
> Cham
>
>
>> For example, to implement a SchemaTransformProvider [1] for
>> BigQueryIO.Write[7], would look like:
>>
>> BigQueryWriteSchemaTransformProvider
>>
>>
>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>> like like:
>>
>> PubsubReadSchemaTransformProvider
>>
>>
>> Definitions/References
>>
>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>> transforms using a language agnostic configuration.
>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>> functions as the configuration of that SchemaProvider.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>
>> [2] *Beam I/O*: PTransform for reading from or writing to sources and
>> sinks.
>> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>
>> [3] *SchemaTransform*: An interface containing a buildTransform method
>> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>>
>> [4] *Row*: A Beam Row is a generic element of data whose properties are
>> defined by a Schema[5].
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>>
>> [5] *Schema*: A description of expected field names and their data types.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>>
>> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
>> PInput or POutput tagged by a String name.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>>
>> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
>> BigQuery table.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>>
>> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
>> message payloads into a PCollection.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>>
>> Suggested Learning/Reading to understand this email
>>
>> 1. https://beam.apache.org/documentation/programming-guide/#overview
>> 2. https://beam.apache.org/documentation/programming-guide/#transforms
>> (Up to 4.1)
>> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
>> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>>
>


Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Chamikara Jayalath via dev
On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev 
wrote:

> Hello Everyone,
>
> Do we like the following Java class naming convention for
> SchemaTransformProviders [1]?  The proposal is:
>
> (Read|Write)SchemaTransformProvider
>
>
> *For those new to Beam, even if this is your first day, consider
> yourselves a welcome contributor to this conversation.  Below are
> definitions/references and a suggested learning guide to understand this
> email.*
>
> Explanation
>
> The  identifies the Beam I/O [2] and Read or Write identifies a
> read or write Ptransform, respectively.
>

Schema-aware transforms are not restricted to I/Os. An arbitrary transform
can be a Schema-Transform.  Also, designation Read/Write does not map to an
arbitrary transform. Probably we should try to make this more generic ?

Also, probably what's more important is the identifier of the
SchemaTransformProvider being unique. Note the class name (the latter is
guaranteed to be unique if we follow the Java package naming guidelines).

FWIW, we came up with a similar generic URN naming scheme for
cross-language transforms:
https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn

Thanks,
Cham


> For example, to implement a SchemaTransformProvider [1] for
> BigQueryIO.Write[7], would look like:
>
> BigQueryWriteSchemaTransformProvider
>
>
> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like
> like:
>
> PubsubReadSchemaTransformProvider
>
>
> Definitions/References
>
> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
> transforms using a language agnostic configuration.
> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
> functions as the configuration of that SchemaProvider.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>
> [2] *Beam I/O*: PTransform for reading from or writing to sources and
> sinks.
> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>
> [3] *SchemaTransform*: An interface containing a buildTransform method
> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>
> [4] *Row*: A Beam Row is a generic element of data whose properties are
> defined by a Schema[5].
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>
> [5] *Schema*: A description of expected field names and their data types.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>
> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
> PInput or POutput tagged by a String name.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>
> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
> BigQuery table.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>
> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
> message payloads into a PCollection.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>
> Suggested Learning/Reading to understand this email
>
> 1. https://beam.apache.org/documentation/programming-guide/#overview
> 2. https://beam.apache.org/documentation/programming-guide/#transforms
> (Up to 4.1)
> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>


Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Ahmed Abualsaud via dev
Thank you for the informative email Damon!
I am in favor of setting an intuitive naming convention early on to reduce
confusion when Schema Transforms become more widespread. I like the
proposed name in your email and I think this convention should also apply
to the rest of the classes involved here, ie:

(action)SchemaTransformConfiguration
and
(action)SchemaTransform

On Tue, Nov 15, 2022 at 2:50 PM Damon Douglas via dev 
wrote:

> Hello Everyone,
>
> Do we like the following Java class naming convention for
> SchemaTransformProviders [1]?  The proposal is:
>
> (Read|Write)SchemaTransformProvider
>
>
> *For those new to Beam, even if this is your first day, consider
> yourselves a welcome contributor to this conversation.  Below are
> definitions/references and a suggested learning guide to understand this
> email.*
>
> Explanation
>
> The  identifies the Beam I/O [2] and Read or Write identifies a
> read or write Ptransform, respectively.
>
> For example, to implement a SchemaTransformProvider [1] for
> BigQueryIO.Write[7], would look like:
>
> BigQueryWriteSchemaTransformProvider
>
>
> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like
> like:
>
> PubsubReadSchemaTransformProvider
>
>
> Definitions/References
>
> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
> transforms using a language agnostic configuration.
> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
> functions as the configuration of that SchemaProvider.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>
> [2] *Beam I/O*: PTransform for reading from or writing to sources and
> sinks.
> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>
> [3] *SchemaTransform*: An interface containing a buildTransform method
> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>
> [4] *Row*: A Beam Row is a generic element of data whose properties are
> defined by a Schema[5].
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>
> [5] *Schema*: A description of expected field names and their data types.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>
> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
> PInput or POutput tagged by a String name.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>
> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
> BigQuery table.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>
> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
> message payloads into a PCollection.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>
> Suggested Learning/Reading to understand this email
>
> 1. https://beam.apache.org/documentation/programming-guide/#overview
> 2. https://beam.apache.org/documentation/programming-guide/#transforms
> (Up to 4.1)
> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>


SchemaTransformProvider | Java class naming convention

2022-11-15 Thread Damon Douglas via dev
Hello Everyone,

Do we like the following Java class naming convention for
SchemaTransformProviders [1]?  The proposal is:

(Read|Write)SchemaTransformProvider


*For those new to Beam, even if this is your first day, consider yourselves
a welcome contributor to this conversation.  Below are
definitions/references and a suggested learning guide to understand this
email.*

Explanation

The  identifies the Beam I/O [2] and Read or Write identifies a
read or write Ptransform, respectively.

For example, to implement a SchemaTransformProvider [1] for
BigQueryIO.Write[7], would look like:

BigQueryWriteSchemaTransformProvider


And to implement a SchemaTransformProvider for PubSubIO.Read[8] would like
like:

PubsubReadSchemaTransformProvider


Definitions/References

[1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
transforms using a language agnostic configuration.
SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
functions as the configuration of that SchemaProvider.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html

[2] *Beam I/O*: PTransform for reading from or writing to sources and sinks.
https://beam.apache.org/documentation/programming-guide/#pipeline-io

[3] *SchemaTransform*: An interface containing a buildTransform method that
returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html

[4] *Row*: A Beam Row is a generic element of data whose properties are
defined by a Schema[5].
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html

[5] *Schema*: A description of expected field names and their data types.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html

[6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single PInput
or POutput tagged by a String name.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html

[7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
BigQuery table.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html

[8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
message payloads into a PCollection.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html

Suggested Learning/Reading to understand this email

1. https://beam.apache.org/documentation/programming-guide/#overview
2. https://beam.apache.org/documentation/programming-guide/#transforms (Up
to 4.1)
3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
4. https://beam.apache.org/documentation/programming-guide/#schemas


Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-15 Thread Ahmet Altay via dev
+1 (binding). - I validated the python quick starts on direct runner.

Thank you!

On Tue, Nov 15, 2022 at 9:51 AM Jean-Baptiste Onofré 
wrote:

> +1 (binding)
>
> Regards
> JB
>
> On Sun, Nov 13, 2022 at 3:52 PM Chamikara Jayalath via dev
>  wrote:
> >
> > Hi everyone,
> > Please review and vote on the release candidate #2 for the version
> 2.43.0, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if
> > no issues are found.
> >
> > The complete staging area is available for your review, which includes:
> > * GitHub Release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.43.0-RC2" [5],
> > * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> > * Java artifacts were built with Gradle 7.5.1 and openjdk version
> 1.8.0_181-google-v7.
> > * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> > * Go artifacts and documentation are available at pkg.go.dev [9]
> > * Validation sheet with a tab for 2.43.0 release to help with validation
> [10].
> > * Docker images published to Docker Hub [11].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >
> > For guidelines on how to try the release in your projects, check out our
> blog post at https://beam.apache.org/blog/validate-beam-release/.
> >
> > Thanks,
> > Cham
> >
> > [1] https://github.com/apache/beam/milestone/5
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1288/
> > [5] https://github.com/apache/beam/tree/v2.43.0-RC2
> > [6] https://github.com/apache/beam/pull/24044
> > [7] https://github.com/apache/beam-site/pull/636
> > [8] https://pypi.org/project/apache-beam/2.43.0rc2/
> > [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
> > [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>


Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-15 Thread Pablo Estrada via dev
+1 (binding)
Tested local tests for existing DF templates.

On Tue, Nov 15, 2022 at 8:17 AM Alexey Romanenko 
wrote:

> +1 (binding)
>
> —
> Alexey
>
> On 15 Nov 2022, at 14:37, Ritesh Ghorse via dev 
> wrote:
>
> +1 (non-binding)
>
> Validated Go SDK quickstart on Direct and Dataflow runner. Also validated
> Dataframe wrapper on Portable and Dataflow runner.
>
>
> On Tue, Nov 15, 2022 at 5:17 AM Anand Inguva via dev 
> wrote:
>
>> +1(non-binding)
>>
>> Validated Python wordcount example on Direct and Dataflow runner. Staging
>> of the Python dependencies works as expected now.
>>
>> Thanks,
>> Anand
>>
>> On Sun, Nov 13, 2022 at 9:52 AM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version
>>> 2.43.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if
>>> no issues are found.
>>>
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint
>>> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.43.0-RC2" [5],
>>> * website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and openjdk version
>>> 1.8.0_181-google-v7.
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>>> * Validation sheet with a tab for 2.43.0 release to help with validation
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> For guidelines on how to try the release in your projects, check out our
>>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1] https://github.com/apache/beam/milestone/5
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1288/
>>> [5] https://github.com/apache/beam/tree/v2.43.0-RC2
>>> [6] https://github.com/apache/beam/pull/24044
>>> [7] https://github.com/apache/beam-site/pull/636
>>> [8] https://pypi.org/project/apache-beam/2.43.0rc2/
>>> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
>>> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>
>>
>


Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-15 Thread Jean-Baptiste Onofré
+1 (binding)

Regards
JB

On Sun, Nov 13, 2022 at 3:52 PM Chamikara Jayalath via dev
 wrote:
>
> Hi everyone,
> Please review and vote on the release candidate #2 for the version 2.43.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if
> no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2], 
> which is signed with the key with fingerprint 
> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.43.0-RC2" [5],
> * website pull request listing the release [6], the blog post [6], and 
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.5.1 and openjdk version 
> 1.8.0_181-google-v7.
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org [2] and PyPI[8].
> * Go artifacts and documentation are available at pkg.go.dev [9]
> * Validation sheet with a tab for 2.43.0 release to help with validation [10].
> * Docker images published to Docker Hub [11].
>
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our blog 
> post at https://beam.apache.org/blog/validate-beam-release/.
>
> Thanks,
> Cham
>
> [1] https://github.com/apache/beam/milestone/5
> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1288/
> [5] https://github.com/apache/beam/tree/v2.43.0-RC2
> [6] https://github.com/apache/beam/pull/24044
> [7] https://github.com/apache/beam-site/pull/636
> [8] https://pypi.org/project/apache-beam/2.43.0rc2/
> [9] https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
> [10] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image


Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-15 Thread Alexey Romanenko
+1 (binding)

—
Alexey

> On 15 Nov 2022, at 14:37, Ritesh Ghorse via dev  wrote:
> 
> +1 (non-binding)
> 
> Validated Go SDK quickstart on Direct and Dataflow runner. Also validated 
> Dataframe wrapper on Portable and Dataflow runner.
> 
> 
> On Tue, Nov 15, 2022 at 5:17 AM Anand Inguva via dev  > wrote:
>> +1(non-binding)
>> 
>> Validated Python wordcount example on Direct and Dataflow runner. Staging of 
>> the Python dependencies works as expected now.
>> 
>> Thanks,
>> Anand
>> 
>> On Sun, Nov 13, 2022 at 9:52 AM Chamikara Jayalath via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Hi everyone,
>>> Please review and vote on the release candidate #2 for the version 2.43.0, 
>>> as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> 
>>> Reviewers are encouraged to test their own use cases with the release 
>>> candidate, and vote +1 if
>>> no issues are found.
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org 
>>>  [2], which is signed with the key with 
>>> fingerprint 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.43.0-RC2" [5],
>>> * website pull request listing the release [6], the blog post [6], and 
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and openjdk version 
>>> 1.8.0_181-google-v7.
>>> * Python artifacts are deployed along with the source release to the 
>>> dist.apache.org  [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev 
>>>  [9]
>>> * Validation sheet with a tab for 2.43.0 release to help with validation 
>>> [10].
>>> * Docker images published to Docker Hub [11].
>>> 
>>> The vote will be open for at least 72 hours. It is adopted by majority 
>>> approval, with at least 3 PMC affirmative votes.
>>> 
>>> For guidelines on how to try the release in your projects, check out our 
>>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> 
>>> Thanks,
>>> Cham
>>> 
>>> [1] https://github.com/apache/beam/milestone/5
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1288/
>>> [5] https://github.com/apache/beam/tree/v2.43.0-RC2
>>> [6] https://github.com/apache/beam/pull/24044
>>> [7] https://github.com/apache/beam-site/pull/636
>>> [8] https://pypi.org/project/apache-beam/2.43.0rc2/
>>> [9] 
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
>>> [10] 
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image



Re: [VOTE] Release 2.43.0, release candidate #2

2022-11-15 Thread Ritesh Ghorse via dev
+1 (non-binding)

Validated Go SDK quickstart on Direct and Dataflow runner. Also validated
Dataframe wrapper on Portable and Dataflow runner.


On Tue, Nov 15, 2022 at 5:17 AM Anand Inguva via dev 
wrote:

> +1(non-binding)
>
> Validated Python wordcount example on Direct and Dataflow runner. Staging
> of the Python dependencies works as expected now.
>
> Thanks,
> Anand
>
> On Sun, Nov 13, 2022 at 9:52 AM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #2 for the version
>> 2.43.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if
>> no issues are found.
>>
>> The complete staging area is available for your review, which includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint
>> 40C61FBE1761E5DB652A1A780CCD5EB2A718A56E [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.43.0-RC2" [5],
>> * website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7].
>> * Java artifacts were built with Gradle 7.5.1 and openjdk version
>> 1.8.0_181-google-v7.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.43.0 release to help with validation
>> [10].
>> * Docker images published to Docker Hub [11].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out our
>> blog post at https://beam.apache.org/blog/validate-beam-release/.
>>
>> Thanks,
>> Cham
>>
>> [1] https://github.com/apache/beam/milestone/5
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.43.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1288/
>> [5] https://github.com/apache/beam/tree/v2.43.0-RC2
>> [6] https://github.com/apache/beam/pull/24044
>> [7] https://github.com/apache/beam-site/pull/636
>> [8] https://pypi.org/project/apache-beam/2.43.0rc2/
>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.43.0-RC2/go/pkg/beam
>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1310009119
>> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>
>


Beam High Priority Issue Report (57)

2022-11-15 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/24163 [Bug]: 
beam_PostCommit_Python_Examples_Spark and beam_PostCommit_Python_Examples_Flink 
failing test custom_ptransform_it_test.CustomPTransformIT
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23745 [Bug]: Samza 
AsyncDoFnRunnerTest.testSimplePipeline is flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21561 
ExternalPythonTransformTest.trivialPythonTransform flaky
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21113 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20975 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19734 
WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp 
order (sickbayed)
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23906 [Bug]: Dataflow jpms tests fail on 
the 2.43.0 release branch
https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23855 [FLAKY-WORKFLOW] [10535797]: THIS 
IS A TEST, PLEASE IGNORE #1
https://github.com/apache/beam/issues/23627 [Bug]: Website precommit flaky
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/23489 [Bug]: add DebeziumIO to the 
connectors page
https://github.com/apache/beam/issues/23306 [Bug]: BigQueryBatchFileLoads in 
python loses data when using WRITE_TRUNCATE
https://github.com/apache/beam/issues/23286 [Bug]: 
beam_PerformanceTests_InfluxDbIO_IT Flaky > 50 % Fail 
https://github.com/apache/beam/issues/22891 [Bug]: 
beam_PostCommit_XVR_PythonUsingJavaDataflow is flaky
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22115 [Bug]: 
apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses
 is flaky
https://github.com/apache/beam/issues/22011 [Bug]: 
org.apache.beam.sdk.io.aws2.kinesis.KinesisIOWriteTest.testWriteFailure flaky