[Beam Summit] Early Bird price for onsite passes ends this Friday

2022-05-12 Thread Mara Ruvalcaba

Beam Summit 2022 Early Bird price ends this Friday!


   

***


   Get your tickets for the onsite event before May 13th and obtain the
   Early Bird price + get the chance to win a $100 gift card

*



 * *

   First 50 onsite registrations will get the chance to win a $100
   amazon gift card.

 *

   Early bird pricing for in-person passes is $290 USD for 2-day pass
   and $350 USD for 3-day pass.

   *

*


   Apply for a scholarship

If you would like to attend in person but cannot afford a ticket, 
pleaseapply for a scholarship. 
Scholarship only covers the 
conference pass, not the travel or lodging.Special thanks to our 
Diversity and Inclusion sponsor Maven Codefor the scholarships.


Register Now 

*

--
Mara Ruvalcaba
COO, SG Software Guru & Nearshore Link
USA: 512 296 2884
MX: 55 5239 5502


[DISCUSS] Next steps for update of Avro dependency in Beam

2022-05-12 Thread Alexey Romanenko
Hi everyone,

Sorry in advance for a long email. 
TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.

I’d like to come back to this old and quite sensitive topic here which is 
Apache Avro version update in Beam. Along the time, we already had several 
discussions on this (for example [1]) but without any concrete resolutions in 
the end, iirc.

As we all know, Beam still depends on quite old Avro version 1.8.2 and there 
were some attempts to bump it to more recent ones. One of the main reasons to 
bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs 
[2], but the latest Avro 1.11.0 brings only one [3]

In the same time, this update with introduce some incompatible changes that 
Avro has between versions and this may affect Beam users and potentially it may 
affect transitive dependencies while using Beam with other project that use 
Avro as well:
- Avro completely moved to java.time.* instead of org.joda.time.*. So, we need 
to adjust date/time conversions from/to Beam schema accordingly since Beam 
schema still uses joda.time. It will require users to regenerate already 
generated Java code with avro-compiler (if any) otherwise it won’t compile; 
- Some minor changes in Avro dependencies and user API;
- Something else?

I know that here, on the list, we have people from Avro community that are much 
more experienced in this than me - so, please correct me if I say something 
wrong or not 100% correct. 


In Beam, we also performed several attempts to update Avro - for example, [4], 
[5], [6] and others.

To make such update easier in the future, we also discussed to move Avro 
dependency out of core Beam [7] and there were an attempt to do that [8] by 
finally this PR was closed with a resolution that it’s not actually needed and 
we may just want to test Beam with different Avro versions [9] 

The latest work on this was a PR to support several versions of Avro in Beam 
(1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, 
iirc.

So, seems that we are a bit stuck on this topic, though, imho, we need to 
decide how move forward mostly because of CVEs in old Avro versions and future 
Avro updates in Beam.

The potential options (as I can see them):

1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent 
one
- Pros: 
- latest/recent Avro dependency; 
- potentially easy to update in the future;
- Cons: 
- breaking change for users; 
- potentially issues with other projects that use Avro (like 
Apache Spark e.g.).

2) Support different Avro versions in Beam, make Avro dependency provided 
- Pros: 
- user decides which versions to use;
- easy to update in the future;
- Cons: 
- breaking change for users; 
- not fact that it’s possible to implement in reality; 
- more tests to test Beam with different Avro versions

3) Extract Avro as an extension, like we do for other formats, and update to 
latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still 
have an issue with CVEs)

4) Anything else?


Please, share your thoughts on this and correct me if I stated something wrong. 
The goal of this discussion is finally to move forward with Avro update topic.

—
Alexey 


[1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r 

[2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2 

[3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0 

[4] https://github.com/apache/beam/pull/9779 

[5] https://github.com/apache/beam/pull/17372 

[6] https://github.com/apache/beam/pull/17246 

[7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on 

[8] https://github.com/apache/beam/pull/12748 

[9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg 

[10] https://github.com/apache/beam/pull/16271 









P1 issues report (78)

2022-05-12 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-14459: Docker Snapshots failing 
to be published since April 14th (created 2022-05-11)
https://issues.apache.org/jira/browse/BEAM-14447: 
BigQueryWriteIntegrationTests.test_big_query_write_insert_errors_reporting 
failing in Python PostCommit (created 2022-05-09)
https://issues.apache.org/jira/browse/BEAM-14434: 
beam_LoadTests_Python_GBK_reiterate_Dataflow_Streaming failure (created 
2022-05-06)
https://issues.apache.org/jira/browse/BEAM-14421: 
--dataflowServiceOptions=use_runner_v2 is broken (created 2022-05-05)
https://issues.apache.org/jira/browse/BEAM-14412: Block release on 
impersonation FR (created 2022-05-04)
https://issues.apache.org/jira/browse/BEAM-14411: TypeCodersTest is never 
executed (created 2022-05-04)
https://issues.apache.org/jira/browse/BEAM-14390: Java license check is 
broken (created 2022-05-02)
https://issues.apache.org/jira/browse/BEAM-14364: 404s in BigQueryIO don't 
get output to Failed Inserts PCollection (created 2022-04-25)
https://issues.apache.org/jira/browse/BEAM-14356: Java PostCommits: 
BigQueryIO.Read needs a GCS temp location (created 2022-04-22)
https://issues.apache.org/jira/browse/BEAM-14298: Can't resolve 
org.pentaho:pentaho-aggdesigner-algorithm:5.1.5-jhyde (created 2022-04-12)
https://issues.apache.org/jira/browse/BEAM-14291: DataflowPipelineResult 
does not raise exception for unsuccessful states. (created 2022-04-11)
https://issues.apache.org/jira/browse/BEAM-14276: 
beam_PostCommit_Java_DataflowV2 failures parent bug (created 2022-04-07)
https://issues.apache.org/jira/browse/BEAM-14275: SpannerWriteIT failing in 
beam PostCommit Java V1 (created 2022-04-07)
https://issues.apache.org/jira/browse/BEAM-14265: Flink should hold the 
watermark at the output timestamp for processing time timers (created 
2022-04-06)
https://issues.apache.org/jira/browse/BEAM-14263: 
beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing 
consistently (created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14253: pubsublite.ReadWriteIT 
failing in beam_PostCommit_Java_DataflowV1 and V2 (created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14239: Changing the output 
timestamp of a timer does not clear the previously set timer (created 
2022-04-04)
https://issues.apache.org/jira/browse/BEAM-14174: Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions  (created 
2022-03-24)
https://issues.apache.org/jira/browse/BEAM-14135: BigQuery Storage API 
insert with writeResult retry and write to error table (created 2022-03-20)
https://issues.apache.org/jira/browse/BEAM-13952: Dataflow streaming tests 
failing new AfterSynchronizedProcessingTime test (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13950: PVR_Spark2_Streaming 
perma-red (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13920: Beam x-lang Dataflow 
tests failing due to _InactiveRpcError (created 2022-02-10)
https://issues.apache.org/jira/browse/BEAM-13852: 
KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions (created 
2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13805: Simplify version override 
for Dev versions of the Go SDK. (created 2022-02-02)
https://issues.apache.org/jira/browse/BEAM-13747: Add integration testing 
for BQ Storage API  write modes (created 2022-01-26)
https://issues.apache.org/jira/browse/BEAM-13715: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle (created 
2022-01-21)
https://issues.apache.org/jira/browse/BEAM-13487: WriteToBigQuery Dynamic 
table destinations returns wrong tableId (created 2021-12-17)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13164: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory (created 2021-11-01)
https://issues.apache.org/jira/browse/BEAM-13132: WriteToBigQuery submits a 
duplicate BQ load job if a 503 error code is returned from googleapi (created 
2021-10-27)

Flaky test issue report (57)

2022-05-12 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-14459: Docker Snapshots failing 
to be published since April 14th (created 2022-05-11)
https://issues.apache.org/jira/browse/BEAM-14410: FnRunnerTest with 
non-trivial (order 1000 elements) numpy input flakes in non-cython environment 
(created 2022-05-04)
https://issues.apache.org/jira/browse/BEAM-14407: Jenkins worker sometimes 
crashes while running Python Flink pipeline (created 2022-05-04)
https://issues.apache.org/jira/browse/BEAM-14367: Flaky timeout in github 
Python unit test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer (created 
2022-04-26)
https://issues.apache.org/jira/browse/BEAM-14349: GroupByKeyTest BasicTests 
testLargeKeys100MB flake (on ULR) (created 2022-04-21)
https://issues.apache.org/jira/browse/BEAM-14276: 
beam_PostCommit_Java_DataflowV2 failures parent bug (created 2022-04-07)
https://issues.apache.org/jira/browse/BEAM-14269: 
PulsarIOTest.testReadFromSimpleTopic is very flaky (created 2022-04-06)
https://issues.apache.org/jira/browse/BEAM-14263: 
beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing 
consistently (created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14252: 
beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors 
(created 2022-04-05)
https://issues.apache.org/jira/browse/BEAM-14216: Multiple XVR Suites 
having similar flakes simultaneously (created 2022-03-31)
https://issues.apache.org/jira/browse/BEAM-14174: Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions  (created 
2022-03-24)
https://issues.apache.org/jira/browse/BEAM-14172: beam_PreCommit_PythonDocs 
failing (jinja2) (created 2022-03-24)
https://issues.apache.org/jira/browse/BEAM-13952: Dataflow streaming tests 
failing new AfterSynchronizedProcessingTime test (created 2022-02-15)
https://issues.apache.org/jira/browse/BEAM-13859: Test flake: 
test_split_half_sdf (created 2022-02-09)
https://issues.apache.org/jira/browse/BEAM-13850: 
beam_PostCommit_Python_Examples_Dataflow failing (created 2022-02-08)
https://issues.apache.org/jira/browse/BEAM-13822: GBK and CoGBK streaming 
Java load tests failing (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13810: Flaky tests: Gradle build 
daemon disappeared unexpectedly (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13809: beam_PostCommit_XVR_Flink 
flaky: Connection refused (created 2022-02-03)
https://issues.apache.org/jira/browse/BEAM-13797: Flakes: Failed to load 
cache entry (created 2022-02-01)
https://issues.apache.org/jira/browse/BEAM-13708: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored (created 2022-01-20)
https://issues.apache.org/jira/browse/BEAM-13575: Flink 
testParDoRequiresStableInput flaky (created 2021-12-28)
https://issues.apache.org/jira/browse/BEAM-13500: NPE in Flink Portable 
ValidatesRunner streaming suite (created 2021-12-21)
https://issues.apache.org/jira/browse/BEAM-13453: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use 
(created 2021-12-13)
https://issues.apache.org/jira/browse/BEAM-13393: GroupIntoBatchesTest is 
failing (created 2021-12-07)
https://issues.apache.org/jira/browse/BEAM-13367: 
[beam_PostCommit_Python36] [ 
apache_beam.io.gcp.experimental.spannerio_read_it_test] Failure summary 
(created 2021-12-01)
https://issues.apache.org/jira/browse/BEAM-13312: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite  (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13311: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite. (created 2021-11-23)
https://issues.apache.org/jira/browse/BEAM-13237: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2 (created 2021-11-12)
https://issues.apache.org/jira/browse/BEAM-13025: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2   (created 2021-10-08)
https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21)
https://issues.apache.org/jira/browse/BEAM-12859: 

Beam Dependency Check Report (2022-05-12)

2022-05-12 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
cachetools
4.2.4
5.0.0
2021-12-27
2021-12-27BEAM-9017
chromedriver-binary
100.0.4896.60.0
102.0.5005.27.0
2022-05-05
2022-05-05BEAM-10426
dill
0.3.1.1
0.3.4
2019-10-07
2021-06-14BEAM-11167
google-api-core
1.31.5
2.7.3
2021-12-20
2022-05-05BEAM-12784
google-auth
1.35.0
2.6.6
2021-08-23
2022-05-05BEAM-12785
google-cloud-bigquery
2.34.3
3.1.0
2022-04-07
2022-05-12BEAM-5537
google-cloud-bigtable
1.7.1
2.9.0
2022-04-07
2022-04-14BEAM-8127
google-cloud-core
1.7.2
2.3.0
2021-08-23
2022-04-14BEAM-5538
google-cloud-dataproc
3.1.1
4.0.2
2022-02-21
2022-04-14BEAM-14055
google-cloud-datastore
1.15.4
2.5.1
2022-04-07
2022-03-14BEAM-8443
google-cloud-language
1.3.1
2.4.1
2022-04-07
2022-03-14BEAM-8
google-cloud-recommendations-ai
0.2.0
0.6.1
2021-07-05
2022-03-14BEAM-13273
google-cloud-spanner
1.19.2
3.14.0
2022-04-14
2022-04-21BEAM-10345
google-cloud-videointelligence
1.16.2
2.7.0
2022-04-07
2022-05-05BEAM-11319
google-cloud-vision
1.0.1
2.7.2
2022-04-07
2022-04-07BEAM-9581
grpcio-tools
1.37.0
1.46.1
2021-04-12
2022-05-12BEAM-9582
jupyter-client
6.1.12
7.3.1
2021-04-12
2022-05-12BEAM-12786
mistune
0.8.4
2.0.2
2021-12-06
2022-01-17BEAM-13382
mock
2.0.0
4.0.3
2019-05-20
2020-12-14BEAM-7369
mypy-protobuf
1.18
3.2.0
2020-03-24
2022-01-24BEAM-10346
Pillow
7.2.0
9.1.0
2020-10-19
2022-04-07BEAM-11071
pluggy
0.13.1
1.0.0
2021-08-30
2021-08-30BEAM-12819
PyHamcrest
1.10.1
2.0.3
2020-01-20
2021-12-13BEAM-9155
pymongo
3.12.3
4.1.1
2021-12-13
2022-04-14BEAM-13383
pytest
4.6.11
7.1.2
2020-07-08
2022-05-05BEAM-8606
pytest-timeout
1.4.2
2.1.0
2021-10-11
2022-01-24BEAM-13029
pytest-xdist
1.34.0
2.5.0
2020-08-17
2021-12-13BEAM-10713
tenacity
5.1.5
8.0.1
2019-11-11
2021-07-19BEAM-8607
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.alibaba:fastjson
1.2.69
2.0.2
2020-05-31
2022-05-02BEAM-8632
com.azure:azure-core
1.9.0
1.28.0
2020-10-02
2022-05-06BEAM-11888
com.azure:azure-identity
1.0.8
1.5.1
2020-07-07
2022-05-06BEAM-11814
com.azure:azure-storage-blob
12.10.0
12.17.0-beta.1
2021-01-15
2022-05-06BEAM-10800
com.azure:azure-storage-common
12.10.0
12.16.0-beta.1
2021-01-14
2022-05-06BEAM-11889
com.datastax.cassandra:cassandra-driver-core
3.10.2
4.0.0
2020-08-26
2019-03-18BEAM-8674
com.datastax.cassandra:cassandra-driver-mapping
3.10.2
3.11.2
2020-08-26
2022-04-28BEAM-8749
com.esotericsoftware:kryo
4.0.2
5.3.0
2018-03-20
2022-02-11BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.33.0
0.42.0
2020-09-14
2022-02-07BEAM-6645
com.github.jbellis:jamm
0.3.0
0.3.3
2014-11-19
2018-11-16BEAM-13622
com.github.jk1.dependency-license-report:com.github.jk1.dependency-license-report.gradle.plugin
1.16
2.1
2020-10-26
2022-01-24BEAM-11120
com.github.spotbugs:spotbugs
4.0.6
4.7.0
2020-06-23
2022-05-04BEAM-7792
com.github.spotbugs:spotbugs-annotations
4.0.6
4.7.0
2020-06-23