Re: Some questions about external tables in BeamSQL

2022-01-13 Thread Steve Niemitz
Thanks for the quick responses! Mine are inline as well.

On Thu, Jan 13, 2022 at 9:01 PM Brian Hulette  wrote:

> I added some responses inline. Also adding dev@ since this is getting
> into SQL internals.
>
> On Thu, Jan 13, 2022 at 10:29 AM Steve Niemitz 
> wrote:
>
>> I've been playing around with CREATE EXTERNAL TABLE (using a custom
>> TableProvider as well) w/ BeamSQL and really love it.  I have a few
>> questions though that I've accumulated as I've been using it I wanted to
>> ask.
>>
>> - I'm a little confused about the need to define columns in the CREATE
>> EXTERNAL TABLE statement.  If I have a BeamSqlTable implementation that can
>> provide the schema on its own, it seems like the columns supplied to the
>> CREATE statement are ignored.  This is ideal anyways, since it's infeasible
>> for users to provide the entire schema up-front, especially for more
>> complicated sources.  Should the column list be optional here instead?
>>
>
> Our documentation would seem to indicate that defining columns is optional
> - looking at the example for BigQuery here [1] the schema is not provided.
> Those docs must be aspirational though, I just checked and the
> BigQueryTableProvider definitely expects the schema to be defined and uses
> it [2].
>
> I think it would make sense to make the column list optional- that way we
> can actually fulfill our BigQuery documentation.
>

Big +1 to that.


> Note if you're building your own custom TableProvider, you might not need
> to use CREATE EXTERNAL TABLE. You could add an implementation for
> TableProvider.getTable that retrieves the metadata for a given table name
> and returns a Table instance that can build the necessary IOs. This is only
> possible if you can retrieve all the metadata you need to construct the
> IOs though. If you want users to be able to configure it further (one
> example might be specifying the read mode for BigQuery), this won't work.
>
> [1]
> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/#bigquery
> [2]
> https://github.com/apache/beam/blob/872455570ae7f3e2e35360bccf93b503ae9fdb5c/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L82
>

Maybe I'm looking at the wrong thing?  Both those examples show a column
list, and the BNF (or whatever) syntax implies at least one "table element"
must be present.

But yeah, this is basically what I'm doing right now.  I just return the
"real" schema in BeamSqlTable.getSchema and ignore whatever was passed in.
It seems to work correctly.  Ideally the column list would be optional
here, as you alluded to above.  It'll be clunky explaining to users
something like "just include any random column list, we'll ignore it".


> - It seems like predicate pushdown only works if the schema is "flat" (has
>> no nested rows).  I understand the complication in pushing down more
>> complicated nested predicates, however, assuming the table implementation
>> doesn't actually attempt to push them down, it seems like it would be fine
>> to allow?
>>
>
> Do we have this limitation? I think predicate pushdown will work
> with predicates on nested fields. The table is presented with a list of
> RexNodes representing separable predicates, an individual predicate could
> add a filter on a nested column IIUC.
>
> We may have the limitation that project pushdown won't work on nested rows
> though, since the API just takes a list of field names. It's possible we
> handle this by passing a joined name (e.g. foo.bar.baz), but I bet not. The
> design doc [3] does have a note saying "no nested tables for now".
>
> [3]
> https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit
>
>

BeamIOPushDownRule short circuits on nested fields [1], I can also verify
this just by the fact that my constructFilter method isn't called when my
schema contains a nested row.

[1]
https://github.com/apache/beam/blob/v2.35.0/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java#L89


>
>> - As a follow up on the above, I'd like to expose a "virtual" field in my
>> schema that represents the partition the data has come from.  For example
>> BigQuery has a similar concept called _PARTITIONTIME.  This would be picked
>> up by the predicate pushdown and used to filter the partitions being read.
>> I can't really figure out how I'd construct something similar here, even if
>> pushdown worked in all cases.  For example, for this query:
>>
>> SELECT * from table
>> where _PARTITIONTIME between X and Y
>>
>> I'd want that filter to be pushed down to my IO, but also the
>> _PARTITIONTIME column wouldn't be returned in the select list.  I was
>> hoping to use BigQueryIO as an example of how to do this, but it doesn't
>> seem like it exposes the virtual _PARTITIONTIME column either.
>>
>
> Yeah I think this will be hard to do with our current abstractions. You
> may be able to do it if 

Re: build snapshot jars for local use

2022-01-13 Thread Clément Guillaume
It works, thank you. I was missing the -Ppublishing

On Thu, Jan 13, 2022 at 3:05 PM Brian Hulette  wrote:

> Hi Clément, welcome!
>
> We have the maven-publish plugin [1] installed, which adds a
> "publishToMavenLocal" task that does what you want. I think you can run it
> with "./gradlew publishToMavenLocal -Ppublishing".
>
> [1] https://docs.gradle.org/current/userguide/publishing_maven.html
>
> On Thu, Jan 13, 2022 at 2:25 PM Clément Guillaume 
> wrote:
>
>> Hey all, I'm a new contributor working on BEAM-13468.
>>
>> After building my local version of beam. I see that all the
>> 2.37.0-SNAPSHOT jars get created.
>> Is there a gradle task that would copy those jars to my ~/.m2/repository/
>> (similar to mvn install) ?
>>
>> Thank you
>>
>> Clément
>>
>


build snapshot jars for local use

2022-01-13 Thread Clément Guillaume
Hey all, I'm a new contributor working on BEAM-13468.

After building my local version of beam. I see that all the 2.37.0-SNAPSHOT
jars get created.
Is there a gradle task that would copy those jars to my ~/.m2/repository/
(similar to mvn install) ?

Thank you

Clément


Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-01-13 Thread Aizhamal Nurmamat kyzy
I think I am enthusiastic enough to help with the doc :) will share the
link soon.

On Thu, Jan 13, 2022 at 10:12 AM Robert Bradshaw 
wrote:

> I don't know if we have consensus, but it seems that some people are
> quite supportive (myself included), and some are ambivalent. The only
> major con I can see is that github doesn't support tagging an issue to
> multiple milestones (but it's unclear how important that is).
>
> I would suggest that someone enthusiastic about this proposal put
> together a doc where we can enumerate the pros and cons and once the
> list seems complete we can bring it back to the list for further
> discussion and/or a vote (if needed, likely not).
>
> On Thu, Jan 13, 2022 at 9:27 AM Alexey Romanenko
>  wrote:
> >
> > I’m not sure that we have a consensus on this. Since this thread
> initially was started to discuss and gather some feedback then I think it
> would be great to have a summary with pros and cons of this migration.
> >
> > —
> > Alexey
> >
> > On 13 Jan 2022, at 00:11, Aizhamal Nurmamat kyzy 
> wrote:
> >
> > Hi all,
> >
> > Is there a consensus to migrate to GitHub?
> >
> > On Wed, Dec 15, 2021 at 9:17 AM Brian Hulette 
> wrote:
> >>
> >>
> >>
> >> On Tue, Dec 14, 2021 at 1:14 PM Kenneth Knowles  wrote:
> >>>
> >>>
> >>>
> >>> On Thu, Dec 9, 2021 at 11:50 PM Jean-Baptiste Onofre 
> wrote:
> 
>  Hi,
> 
>  No problem for me. The only thing I don’t like with GitHub issues is
> that fact that it’s not possible to “assign” several milestones to an issue.
>  When we maintain several active branch/version, it sucks (one issue
> == one milestone), as we have to create several issue.
> >>>
> >>>
> >>> This is a good point to consider. In Beam we often create multiple
> issues anyhow when we intend to backport/cherrypick a fix. One issue for
> the original fix and one each targeted cherrypick. This way their
> resolution status can be tracked separately. But it is nice for users to be
> able to go back and edit the original bug report to say which versions are
> affected and which are not.
> >>
> >>
> >> I looked into this a little bit. It looks like milestones don't have to
> represent a release (e.g. they could represent some abstract goal), but
> they are often associated with releases. This seems like a reasonable field
> to map to "Fix Version/s" in jira, but jira does support specifying
> multiple releases. So one issue == one milestone would be a regression.
> >> As Kenn pointed out though we often create a separate jira to track
> backports anyway (even though we could just specify multiple fix versions),
> so I'm not sure this is a significant blocker.
> >>
> >> If we want to use milestones to track abstract goals, I think we'd be
> out of luck. We could just use labels, but the GitHub UI doesn't present a
> nice burndown chart for those. See
> https://github.com/pandas-dev/pandas/milestones vs.
> https://github.com/pandas-dev/pandas/labels. FWIW jira doesn't have great
> functionality here either.
> >>
> >>>
> >>>
> >>> Kenn
> >>>
> 
> 
>  Regards
>  JB
> 
>  > Le 10 déc. 2021 à 01:28, Kyle Weaver  a écrit
> :
>  >
>  > I’m in favor of switching to Github issues. I can’t think of a
> single thing jira does better.
>  >
>  > Thanks Jarek, this is a really great resource [1]. For another
> reference, the Calcite project is engaged in the same discussion right now
> [2]. I came up with many of the same points independently before I saw
> their thread.
>  >
>  > When evaluating feature parity, we should make a distinction
> between non-structured (text) and structured data. And we don’t need a
> strict mechanical mapping for everything unless we’re planning on
> automatically migrating all existing issues. I don’t see the point in
> automatic migration, though; as Jarek pointed out, we’d end up perpetuating
> a ton of obsolete issues.
>  >
>  >   • We use nested issues and issue relations in jira, but as
> far as I know robots don’t use them and we don’t query them much, so we’re
> not losing anything by moving from an API to plain English descriptions:
> “This issue is blocked by issue #n.” Mentions show up automatically on
> other issues.
>  >   • For component, type, priority, etc., we can use Github
> labels.
>  >   • Version(s) affected is used inconsistently, and as far as I
> know only by humans, so a simple English description is fine. We can follow
> the example of other projects and make the version affected a part of the
> issue template.
>  >   • For fix version, which we use to track which issues we want
> to fix in upcoming releases, as well as automatically generate release
> notes: Github has “milestones,” which can be marked on PRs or issues, or
> both.
>  >   • IMO the automatically generated JIRA release notes
> are not especially useful anyway. They are too detailed for a quick
> summary, and not precise enough to show everything. For

Re: [DISCUSS] propdeps removal and what to do going forward

2022-01-13 Thread Ismaël Mejía
Optional dependencies should not be a major issue.

What matters to validate that we are not breaking users is to compare
the generated POM files with the previous (pre gradle 7 / 2.35.0)
version and see that what was provided is still provided.

In particular the Hadoop/Spark and Kafka dependencies must be
**provided** as they were. I am not sure of others but those three
matter.

Ismaël

On Wed, Jan 12, 2022 at 10:55 PM Emily Ye  wrote:
>
> We've chatted offline and have a tentative plan for what to do with these 
> dependencies that are currently marked as compileOnly (instead of provided). 
> Please review the list if possible [1].
>
> Two projects we aren't sure about:
>
> :sdks:java:io:hcatalog
>
> library.java.jackson_annotations
> library.java.jackson_core
> library.java.jackson_databind
> library.java.hadoop_common
> org.apache.hive:hive-exec
> org.apache.hive.hcatalog:hive-hcatalog-core
>
> :sdks:java:io:parquet
>
> library.java.hadoop_client
>
>
> Does anyone have experience with either of these IOs? ccing Chamikara
>
> Thank you,
> Emily
>
>
> [1] 
> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>
> On Tue, Jan 11, 2022 at 6:38 PM Emily Ye  wrote:
>>
>> As the person volunteering to do fixes for this to unblock Beam 2.36.0, I 
>> created a spreadsheet of the projects with dependencies changed from 
>> provided to compile only [1]. I pre-filled with what I think things should 
>> be, but I don't have very much background in java/maven/gradle 
>> configurations so please give input!
>>
>> Some (mainly hadoop/kafka) I left blank, since I'm not sure - do we keep 
>> them provided because it depends on the user's version?
>>
>> [1] 
>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>>
>> On Tue, Jan 11, 2022 at 1:17 PM Luke Cwik  wrote:
>>>
>>> I'm not following what you're trying to say Kenn since provided in maven 
>>> requires the user to explicitly add the dependency themselves to have it 
>>> part of their runtime.
>>>
>>> As per 
>>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#dependency-scope
>>> "
>>> * provided
>>> This is much like compile, but indicates you expect the JDK or a container 
>>> to provide the dependency at runtime. For example, when building a web 
>>> application for the Java Enterprise Edition, you would set the dependency 
>>> on the Servlet API and related Java EE APIs to scope provided because the 
>>> web container provides those classes. A dependency with this scope is added 
>>> to the classpath used for compilation and test, but not the runtime 
>>> classpath. It is not transitive."
>>>
>>> On Tue, Jan 11, 2022 at 11:54 AM Kenneth Knowles  wrote:

 To clarify: "provided" should have been in the test runtime configuration, 
 but not in the shipped runtime configuration (otherwise dep resolution for 
 users would pull in provided deps, which should not happen)

 On Thu, Dec 30, 2021 at 10:05 AM Luke Cwik  wrote:
>
> During the migration to Gradle 7[1] the propdeps plugin was removed[2] 
> since there wasn't a newer version that was compatible with Gradle 7 and 
> a replacement couldn't be found. All existing usages of "provided" were 
> moved to "compileOnly" and "compileOnly" is being mapped to the 
> "provided" maven scope in the generated pom files. This has lead to two 
> issues:
> 1) provided was also part of the runtime configuration, so we are getting 
> a few class not found exceptions when running tests [3]
> 2) the generated pom.xml will have a bunch of compile time only 
> annotations added as a provided dependency in the generated pom files[4]
>
> #1 can be fixed by adding the dependency to both the "compileOnly" and 
> "runtimeOnly" configurations or by adding dependency to the 
> "implementation" configuration
> #2 will make the pom files messier which can lead to confusion for users 
> but shouldn't impact existing uses.
>
> There was a suggestion[4] to completely remove the usage of provided from 
> the generated pom.xml and have all our previously "provided" dependencies 
> declared as "implementation" allowing us to solve both #1 and #2 above.
>
> The largest usage of "provided" in the past was to packages related to 
> the hadoop ecosystem and afterwards it was for packages such as 
> junit/hamcrest/aircompressor in sdks/java/core which aren't required to 
> use the module but can provide additional features if the dependency 
> exists.
>
> What should we migrate if anything to the "implementation" configuration 
> or should we try to recreate what we were doing with the "provided" 
> configuration in the past?
>
> 1: https://issues.apache.org/jira/browse/BEAM-13430
> 2: https://github.com/apache/beam/pull/16308
> 3: https://issues.apache.

OffsetRange | OffsetRangeTracker custom implementation on Comparable Class

2022-01-13 Thread Marco Robles
Hi folks,

Has anyone implemented a custom implementation of OffsetRange and
OffsetRangeTracker based on a Comparable Java class?

Thanks

-- 

*Marco Robles* *|* WIZELINE

Software Engineer

marco.rob...@wizeline.com

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


Re: [DISCUSS] Migrate Jira to GitHub Issues?

2022-01-13 Thread Alexey Romanenko
I’m not sure that we have a consensus on this. Since this thread initially was 
started to discuss and gather some feedback then I think it would be great to 
have a summary with pros and cons of this migration.

—
Alexey

> On 13 Jan 2022, at 00:11, Aizhamal Nurmamat kyzy  wrote:
> 
> Hi all,
> 
> Is there a consensus to migrate to GitHub?
> 
> On Wed, Dec 15, 2021 at 9:17 AM Brian Hulette  > wrote:
> 
> 
> On Tue, Dec 14, 2021 at 1:14 PM Kenneth Knowles  > wrote:
> 
> 
> On Thu, Dec 9, 2021 at 11:50 PM Jean-Baptiste Onofre  > wrote:
> Hi,
> 
> No problem for me. The only thing I don’t like with GitHub issues is that 
> fact that it’s not possible to “assign” several milestones to an issue.
> When we maintain several active branch/version, it sucks (one issue == one 
> milestone), as we have to create several issue.
> 
> This is a good point to consider. In Beam we often create multiple issues 
> anyhow when we intend to backport/cherrypick a fix. One issue for the 
> original fix and one each targeted cherrypick. This way their resolution 
> status can be tracked separately. But it is nice for users to be able to go 
> back and edit the original bug report to say which versions are affected and 
> which are not.
> 
> I looked into this a little bit. It looks like milestones don't have to 
> represent a release (e.g. they could represent some abstract goal), but they 
> are often associated with releases. This seems like a reasonable field to map 
> to "Fix Version/s" in jira, but jira does support specifying multiple 
> releases. So one issue == one milestone would be a regression.
> As Kenn pointed out though we often create a separate jira to track backports 
> anyway (even though we could just specify multiple fix versions), so I'm not 
> sure this is a significant blocker.
> 
> If we want to use milestones to track abstract goals, I think we'd be out of 
> luck. We could just use labels, but the GitHub UI doesn't present a nice 
> burndown chart for those. See https://github.com/pandas-dev/pandas/milestones 
>  vs. 
> https://github.com/pandas-dev/pandas/labels 
> . FWIW jira doesn't have great 
> functionality here either.
>  
> 
> Kenn
>  
> 
> Regards
> JB
> 
> > Le 10 déc. 2021 à 01:28, Kyle Weaver  > > a écrit :
> > 
> > I’m in favor of switching to Github issues. I can’t think of a single thing 
> > jira does better.
> > 
> > Thanks Jarek, this is a really great resource [1]. For another reference, 
> > the Calcite project is engaged in the same discussion right now [2]. I came 
> > up with many of the same points independently before I saw their thread.
> > 
> > When evaluating feature parity, we should make a distinction between 
> > non-structured (text) and structured data. And we don’t need a strict 
> > mechanical mapping for everything unless we’re planning on automatically 
> > migrating all existing issues. I don’t see the point in automatic 
> > migration, though; as Jarek pointed out, we’d end up perpetuating a ton of 
> > obsolete issues.
> > 
> >   • We use nested issues and issue relations in jira, but as far as I 
> > know robots don’t use them and we don’t query them much, so we’re not 
> > losing anything by moving from an API to plain English descriptions: “This 
> > issue is blocked by issue #n.” Mentions show up automatically on other 
> > issues.
> >   • For component, type, priority, etc., we can use Github labels.
> >   • Version(s) affected is used inconsistently, and as far as I know 
> > only by humans, so a simple English description is fine. We can follow the 
> > example of other projects and make the version affected a part of the issue 
> > template.
> >   • For fix version, which we use to track which issues we want to fix 
> > in upcoming releases, as well as automatically generate release notes: 
> > Github has “milestones,” which can be marked on PRs or issues, or both.
> >   • IMO the automatically generated JIRA release notes are not 
> > especially useful anyway. They are too detailed for a quick summary, and 
> > not precise enough to show everything. For a readable summary, we use 
> > CHANGES.md to highlight changes we especially want users to know about. For 
> > a complete list of changes, there’s the git commit log, which is the 
> > ultimate source of truth.
> >   • We’d only want to preserve reporter and assignee if we’re planning 
> > on migrating everything automatically, and even then I think it’d be fine 
> > to compile a map of active contributors and drop the rest.
> > 
> > As for the advantages of switching (just the ones off the top of my head):
> >   • As others have mentioned, it’s less burden for new contributors to 
> > create new issues and comment on existing ones.
> >   • Effortless linking between 

P1 issues report (64)

2022-01-13 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-13616: Update protobuf-java to 
3.19.2 and other vendored dependencies that use protobuf (created 2022-01-08)
https://issues.apache.org/jira/browse/BEAM-13615: Bumping up FnApi 
environment version to 9 in Java, Python SDK (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13611: 
CrossLanguageJdbcIOTest.test_xlang_jdbc_write failing in Python PostCommits 
(created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13606: bigtable io doesn't 
handle non-ok row mutations (created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13603: Shared object does not 
read from cache when using tag (created 2022-01-05)
https://issues.apache.org/jira/browse/BEAM-13598: Install Java 17 on 
Jenkins VM (created 2022-01-04)
https://issues.apache.org/jira/browse/BEAM-13582: Beam website precommit 
mentions broken links, but passes. (created 2021-12-30)
https://issues.apache.org/jira/browse/BEAM-13579: Cannot run 
python_xlang_kafka_taxi_dataflow validation script on 2.35.0 (created 
2021-12-29)
https://issues.apache.org/jira/browse/BEAM-13522: Spark tests failing 
PerKeyOrderingTest (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13504: Remove 
provided/compileOnly deps not intended for external use (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13503: BulkIO public 
constructor: Missing required property: throwWriteErrors (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13430: Upgrade Gradle version to 
7.3 (created 2021-12-09)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13314: Revise recommendations to 
manage Python pipeline dependencies.  (created 2021-11-24)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13213: OnWindowExpiration does 
not work without other state (created 2021-11-10)
https://issues.apache.org/jira/browse/BEAM-13203: Potential data loss when 
using SnsIO.writeAsync (created 2021-11-08)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13087: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible 
(created 2021-10-20)
https://issues.apache.org/jira/browse/BEAM-13078: Python DirectRunner does 
not emit data at GC time (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13076: Python AfterAny, AfterAll 
do not follow spec (created 2021-10-18)
https://issues.apache.org/jira/browse/BEAM-13059: Migrate GKE workloads to 
Containerd (created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13058: Upgrade Kubernetes APIs 
(created 2021-10-15)
https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files 
(created 2021-10-06)
https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with 
random prefix (created 2021-10-04)
https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in 
CombinePerKey operation (created 2021-09-26)
https://issues.apache.org/jira/browse/BEAM-12867: Either Create or 
DirectRunner fails to produce all elements to the following transform (created 
2021-09-09)
https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) 
Bricked Dataflow Pipeline  (created 2021-09-06)
https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect 
pipeline proto when core-construction-java jar is not in the CLASSPATH (created 
2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12792: Beam worker only installs 
--extra_package once (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12632: ElasticsearchIO: Enabling 
both User/Pass auth and SSL overwrites User/Pass (created 2021-07-16)
https://issues.apache.org/jira/browse/BEAM-12621: Update Jenkins VMs to 
modern Ubuntu version (created 2021-07-15)
https://issues.apach

Flaky test issue report (41)

2022-01-13 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-13611: 
CrossLanguageJdbcIOTest.test_xlang_jdbc_write failing in Python PostCommits 
(created 2022-01-07)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13525: Java VR (Dataflow, V2, 
Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests (created 
2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13519: Java precommit flaky 
(timing out) (created 2021-12-22)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13401: 
beam_PostCommit_Java_DataflowV2 
org.apache.beam.sdk.io.gcp.pubsublite.ReadWriteIT flaky (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13234: Flake in 
StreamingWordCountIT.test_streaming_wordcount_it (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12858: 
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler 
is flaky (created 2021-09-08)
https://issues.apache.org/jira/browse/BEAM-12809: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26)
https://issues.apache.org/jira/browse/BEAM-12794: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12793: 
beam_PostRelease_NightlySnapshot failed (created 2021-08-24)
https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16)
https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data (created 2021-06-18)
https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: 
Failed to read inputs in the data plane (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: 
"Memory constraints are impeding performance" (created 2021-02-18)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11641: Bigquery Read tests are 
flaky on Flink runner in Python PostCommit suites (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job (Fl