Re: Java object serialization error, java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible

2022-08-26 Thread Elliot Metsger
Yep!  Thanks Robert for engaging on Slack!  Just had to dig in a bit - I
ended up building the 3.3.0 jobserver from scratch (the docker build env is
very slick, despite encountering a couple of hiccups), before I realized
the image was available on docker hub :facepalm: ...

On Fri, Aug 26, 2022 at 7:50 PM Robert Burke  wrote:

> Woohoo! Glad that was figured out.
>
> On Fri, Aug 26, 2022, 4:40 PM Elliot Metsger  wrote:
>
>>
>> So, it turns out this was a Spark version mismatch between the Beam
>> JobServer and the Spark platform.
>>
>> I'm running both Beam and Spark on Docker; the Spark image [0] provided
>> version 3.3.0 and Scala version 2.12, but I used
>> apache/beam_spark_job_server:2.41.0 [1] which provides Spark 2.4.x
>> libraries, Scala version 2.11.  Instead, I needed to use the
>> apache/beam_spark3_job_server:2.41.0 image [2], which provides Spark 3.3.x.
>>
>> [0] https://hub.docker.com/r/apache/spark/tags
>> [1] https://hub.docker.com/r/apache/beam_spark_job_server/tags
>> [2] https://hub.docker.com/r/apache/beam_spark3_job_server/tags
>>
>> On 2022/08/25 13:48:16 Elliot Metsger wrote:
>> > Howdy folks, super-new to Beam, and attempting to get a simple example
>> > working with Go, using the portable runner and Spark. There seems to be
>> an
>> > incompatibility between Java components, and I’m not quite sure where
>> the
>> > disconnect is, but at the root it seems to be an incompatibility with
>> > object serializations.
>> >
>> > When I submit the job via the go sdk, it errors out on the Spark side
>> with:
>> > [8:59 AM] 22/08/25 12:45:59 ERROR TransportRequestHandler: Error while
>> > invoking RpcHandler#receive() for one-way message.
>> > java.io.InvalidClassException:
>> > org.apache.spark.deploy.ApplicationDescription; local class
>> incompatible:
>> > stream classdesc serialVersionUID = 6543101073799644159, local class
>> > serialVersionUID = 1574364215946805297
>> > I’m using apache/beam_spark_job_server:2.41.0 and apache/spark:latest.
>> >  (docker-compose[0], hello world wordcount example pipeline[1]).
>> >
>> > Any ideas on where to look?  It looks like the Beam JobService is using
>> > Java 8 (?) and Spark is using Java 11.  I’ve tried downgrading Spark
>> from
>> > 3.3.0 to 3.1.3 (the earliest version for which Docker images are
>> > available), and downgrading to Beam 2.40.0 with no luck.
>> >
>> > This simple repo[2] should demonstrate the issue.  Any pointers would
>> be
>> > appreciated!
>> >
>> > [0]:
>> https://github.com/emetsger/beam-test/blob/develop/docker-compose.yml
>> > [1]:
>> >
>> https://github.com/emetsger/beam-test/blob/develop/debugging_wordcount.go
>> > [2]: https://github.com/emetsger/beam-test
>> >
>>
>


Re: Java object serialization error, java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible

2022-08-26 Thread Robert Burke
Woohoo! Glad that was figured out.

On Fri, Aug 26, 2022, 4:40 PM Elliot Metsger  wrote:

>
> So, it turns out this was a Spark version mismatch between the Beam
> JobServer and the Spark platform.
>
> I'm running both Beam and Spark on Docker; the Spark image [0] provided
> version 3.3.0 and Scala version 2.12, but I used
> apache/beam_spark_job_server:2.41.0 [1] which provides Spark 2.4.x
> libraries, Scala version 2.11.  Instead, I needed to use the
> apache/beam_spark3_job_server:2.41.0 image [2], which provides Spark 3.3.x.
>
> [0] https://hub.docker.com/r/apache/spark/tags
> [1] https://hub.docker.com/r/apache/beam_spark_job_server/tags
> [2] https://hub.docker.com/r/apache/beam_spark3_job_server/tags
>
> On 2022/08/25 13:48:16 Elliot Metsger wrote:
> > Howdy folks, super-new to Beam, and attempting to get a simple example
> > working with Go, using the portable runner and Spark. There seems to be
> an
> > incompatibility between Java components, and I’m not quite sure where the
> > disconnect is, but at the root it seems to be an incompatibility with
> > object serializations.
> >
> > When I submit the job via the go sdk, it errors out on the Spark side
> with:
> > [8:59 AM] 22/08/25 12:45:59 ERROR TransportRequestHandler: Error while
> > invoking RpcHandler#receive() for one-way message.
> > java.io.InvalidClassException:
> > org.apache.spark.deploy.ApplicationDescription; local class incompatible:
> > stream classdesc serialVersionUID = 6543101073799644159, local class
> > serialVersionUID = 1574364215946805297
> > I’m using apache/beam_spark_job_server:2.41.0 and apache/spark:latest.
> >  (docker-compose[0], hello world wordcount example pipeline[1]).
> >
> > Any ideas on where to look?  It looks like the Beam JobService is using
> > Java 8 (?) and Spark is using Java 11.  I’ve tried downgrading Spark
> from
> > 3.3.0 to 3.1.3 (the earliest version for which Docker images are
> > available), and downgrading to Beam 2.40.0 with no luck.
> >
> > This simple repo[2] should demonstrate the issue.  Any pointers would be
> > appreciated!
> >
> > [0]:
> https://github.com/emetsger/beam-test/blob/develop/docker-compose.yml
> > [1]:
> >
> https://github.com/emetsger/beam-test/blob/develop/debugging_wordcount.go
> > [2]: https://github.com/emetsger/beam-test
> >
>


RE: Java object serialization error, java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible

2022-08-26 Thread Elliot Metsger
So, it turns out this was a Spark version mismatch between the Beam
JobServer and the Spark platform.

I'm running both Beam and Spark on Docker; the Spark image [0] provided
version 3.3.0 and Scala version 2.12, but I used
apache/beam_spark_job_server:2.41.0 [1] which provides Spark 2.4.x
libraries, Scala version 2.11.  Instead, I needed to use the
apache/beam_spark3_job_server:2.41.0 image [2], which provides Spark 3.3.x.

[0] https://hub.docker.com/r/apache/spark/tags
[1] https://hub.docker.com/r/apache/beam_spark_job_server/tags
[2] https://hub.docker.com/r/apache/beam_spark3_job_server/tags

On 2022/08/25 13:48:16 Elliot Metsger wrote:
> Howdy folks, super-new to Beam, and attempting to get a simple example
> working with Go, using the portable runner and Spark. There seems to be an
> incompatibility between Java components, and I’m not quite sure where the
> disconnect is, but at the root it seems to be an incompatibility with
> object serializations.
>
> When I submit the job via the go sdk, it errors out on the Spark side
with:
> [8:59 AM] 22/08/25 12:45:59 ERROR TransportRequestHandler: Error while
> invoking RpcHandler#receive() for one-way message.
> java.io.InvalidClassException:
> org.apache.spark.deploy.ApplicationDescription; local class incompatible:
> stream classdesc serialVersionUID = 6543101073799644159, local class
> serialVersionUID = 1574364215946805297
> I’m using apache/beam_spark_job_server:2.41.0 and apache/spark:latest.
>  (docker-compose[0], hello world wordcount example pipeline[1]).
>
> Any ideas on where to look?  It looks like the Beam JobService is using
> Java 8 (?) and Spark is using Java 11.  I’ve tried downgrading Spark from
> 3.3.0 to 3.1.3 (the earliest version for which Docker images are
> available), and downgrading to Beam 2.40.0 with no luck.
>
> This simple repo[2] should demonstrate the issue.  Any pointers would be
> appreciated!
>
> [0]: https://github.com/emetsger/beam-test/blob/develop/docker-compose.yml
> [1]:
> https://github.com/emetsger/beam-test/blob/develop/debugging_wordcount.go
> [2]: https://github.com/emetsger/beam-test
>


Re: [ANNOUNCE] Apache Beam 2.41.0 Released

2022-08-26 Thread Byron Ellis via dev
Thanks Kiley!

On Fri, Aug 26, 2022 at 9:37 AM Ahmet Altay via dev 
wrote:

> Thank you Kiley!
>
> On Fri, Aug 26, 2022 at 6:38 AM P Singh 
> wrote:
>
>> Hi Team,
>>
>> Looking forward to trying and testing the new version, It's always
>> fascinating.
>>
>> On Fri, 26 Aug 2022 at 03:45, Pablo Estrada via user <
>> u...@beam.apache.org> wrote:
>>
>>> Thank you Kiley!
>>>
>>> On Thu, Aug 25, 2022 at 10:55 AM Kiley Sok  wrote:
>>>
 The Apache Beam team is pleased to announce the release of version
 2.41.0.

 Apache Beam is an open source unified programming model to define and
 execute data processing pipelines, including ETL, batch and stream
 (continuous) processing. See https://beam.apache.org

 You can download the release here:

 https://beam.apache.org/get-started/downloads/

 This release includes bug fixes, features, and improvements detailed on
 the Beam blog: https://beam.apache.org/blog/beam-2.41.0/

 Thanks to everyone who contributed to this release, and we hope you
 enjoy using Beam 2.41.0.

 -- Kiley, on behalf of The Apache Beam team

>>>


Re: [ANNOUNCE] Apache Beam 2.41.0 Released

2022-08-26 Thread Ahmet Altay via dev
Thank you Kiley!

On Fri, Aug 26, 2022 at 6:38 AM P Singh  wrote:

> Hi Team,
>
> Looking forward to trying and testing the new version, It's always
> fascinating.
>
> On Fri, 26 Aug 2022 at 03:45, Pablo Estrada via user 
> wrote:
>
>> Thank you Kiley!
>>
>> On Thu, Aug 25, 2022 at 10:55 AM Kiley Sok  wrote:
>>
>>> The Apache Beam team is pleased to announce the release of version
>>> 2.41.0.
>>>
>>> Apache Beam is an open source unified programming model to define and
>>> execute data processing pipelines, including ETL, batch and stream
>>> (continuous) processing. See https://beam.apache.org
>>>
>>> You can download the release here:
>>>
>>> https://beam.apache.org/get-started/downloads/
>>>
>>> This release includes bug fixes, features, and improvements detailed on
>>> the Beam blog: https://beam.apache.org/blog/beam-2.41.0/
>>>
>>> Thanks to everyone who contributed to this release, and we hope you
>>> enjoy using Beam 2.41.0.
>>>
>>> -- Kiley, on behalf of The Apache Beam team
>>>
>>


Re: [ANNOUNCE] Apache Beam 2.41.0 Released

2022-08-26 Thread P Singh
Hi Team,

Looking forward to trying and testing the new version, It's always
fascinating.

On Fri, 26 Aug 2022 at 03:45, Pablo Estrada via user 
wrote:

> Thank you Kiley!
>
> On Thu, Aug 25, 2022 at 10:55 AM Kiley Sok  wrote:
>
>> The Apache Beam team is pleased to announce the release of version 2.41.0.
>>
>> Apache Beam is an open source unified programming model to define and
>> execute data processing pipelines, including ETL, batch and stream
>> (continuous) processing. See https://beam.apache.org
>>
>> You can download the release here:
>>
>> https://beam.apache.org/get-started/downloads/
>>
>> This release includes bug fixes, features, and improvements detailed on
>> the Beam blog: https://beam.apache.org/blog/beam-2.41.0/
>>
>> Thanks to everyone who contributed to this release, and we hope you enjoy
>> using Beam 2.41.0.
>>
>> -- Kiley, on behalf of The Apache Beam team
>>
>


Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-26 Thread Jarek Potiuk
Happy to help and I hope we can work together with Valentyn and others to
get the "google clients" approach improved :)

J.


On Fri, Aug 26, 2022 at 3:40 PM Kerry Donny-Clark via dev <
dev@beam.apache.org> wrote:

> Jarek, I really appreciate you sharing your experience and expertise here.
> I think Beam would benefit from adopting some of these practices.
> Kerry
>
> On Fri, Aug 26, 2022, 7:35 AM Jarek Potiuk  wrote:
>
>>
>>> I'm curious Jarek, does Airflow take any dependencies on popular
>>> libraries like pandas, numpy, pyarrow, scipy, etc... which users are likely
>>> to have their own dependency on? I think these dependencies are challenging
>>> in a different way than the client libraries - ideally we would support a
>>> wide version range so as not to require users to upgrade those libraries in
>>> lockstep with Beam. However in some cases our dependency is pretty tight
>>> (e.g. the DataFrame API's dependency on pandas), so we need to make sure to
>>> explicitly test with multiple different versions. Does Airflow have any
>>> similar issues?
>>>
>>
>> Yes we do (all of those I think :) ). Complete set of all our deps can be
>> found here
>> https://github.com/apache/airflow/blob/constraints-main/constraints-3.9.txt
>> (continuously updated and we have different sets for different python
>> versions).
>>
>> We took a rather interesting and unusual approach (more details in my
>> talk) - mainly because Airflow is both an application to install (for
>> users) and library to use (for DAG authors) and both have contradicting
>> expectations (installation stability versus flexibility in
>> upgrading/downgrading dependencies). Our approach is really smart in making
>> sure water and fire play well with each other.
>>
>> Most of those dependencies are coming from optional extras (list of all
>> extras here:
>> https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html).
>> More often than not the "problematic" dependencies you mention are
>> transitive dependencies through some client libraries we use (for example
>> Apache Beam SDK is a big contributor to those :).
>>
>> Airflow "core" itself has far less dependencies
>> https://github.com/apache/airflow/blob/constraints-main/constraints-no-providers-3.9.txt
>> (175 currently) and we actively made sure that all "pandas" of this world
>> are only optional extra deps.
>>
>> Now - the interesting thing is that we use "constraints'' (the links you
>> with dependencies that I posted are those constraints) to pin versions of
>> the dependencies that are "golden" - i.e. we test those continuously in our
>> CI and we automatically upgrade the constraints when all the unit and
>> integration tests pass.
>> There is a little bit of complexity and sometimes conflicts to handle (as
>> `pip` has to find the right set of deps that will work for all our optional
>> extras), but eventually we have really one "golden" set of constraints at
>> any moment in time main (or v2-x branch - we have a separate set for each
>> branch) that we are dealing with. And this is the only "set" of dependency
>> versions that Airflow gets tested with. Note - these are *constraints *not
>> *requirements *- that makes a whole world of difference.
>>
>> Then when we release airflow, we "freeze" the constraints with the
>> version tag. We know they work because all our tests pass with them in CI.
>>
>> Then we communicate to our users (and we use it in our Docker image) that
>> the only "supported" way of installing airflow is with using `pip` and
>> constraints
>> https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html.
>> And we do not support poetry, pipenv - we leave it up to users to handle
>> them (until poetry/pipenv will support constraints - which we are waiting
>> for and there is an issue where I explained  why it is useful). It looks
>> like that `pip install "apache-airflow==2.3.4" --constraint "
>> https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.9.txt"`
>> (different constraints for different airflow version and Python version you
>> have)
>>
>> Constraints have this nice feature that they are only used during the
>> "pip install" phase and thrown out immediately after the install is
>> complete. They do not create "hard" requirements for airflow. Airflow still
>> has a number of "lower-bound" limits for a number of constraints but we try
>> to avoid putting upper-bounds at all (only in specific cases and
>> documenting them) and our bounds are rather relaxed. This way we achieve
>> two things:
>>
>> 1) when someone does not use constraints and has a problem with broken
>> dependency - we tell them to use constraints - this is what we as a
>> community commit to and support
>> 2) but by using constraints mechanism we do not limit our users if they
>> want to upgrade or downgrade any dependencies. They are free to do it (as
>> long as it fits the - rather relaxed lower/upper bounds of 

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-26 Thread Kerry Donny-Clark via dev
Jarek, I really appreciate you sharing your experience and expertise here.
I think Beam would benefit from adopting some of these practices.
Kerry

On Fri, Aug 26, 2022, 7:35 AM Jarek Potiuk  wrote:

>
>> I'm curious Jarek, does Airflow take any dependencies on popular
>> libraries like pandas, numpy, pyarrow, scipy, etc... which users are likely
>> to have their own dependency on? I think these dependencies are challenging
>> in a different way than the client libraries - ideally we would support a
>> wide version range so as not to require users to upgrade those libraries in
>> lockstep with Beam. However in some cases our dependency is pretty tight
>> (e.g. the DataFrame API's dependency on pandas), so we need to make sure to
>> explicitly test with multiple different versions. Does Airflow have any
>> similar issues?
>>
>
> Yes we do (all of those I think :) ). Complete set of all our deps can be
> found here
> https://github.com/apache/airflow/blob/constraints-main/constraints-3.9.txt
> (continuously updated and we have different sets for different python
> versions).
>
> We took a rather interesting and unusual approach (more details in my
> talk) - mainly because Airflow is both an application to install (for
> users) and library to use (for DAG authors) and both have contradicting
> expectations (installation stability versus flexibility in
> upgrading/downgrading dependencies). Our approach is really smart in making
> sure water and fire play well with each other.
>
> Most of those dependencies are coming from optional extras (list of all
> extras here:
> https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html).
> More often than not the "problematic" dependencies you mention are
> transitive dependencies through some client libraries we use (for example
> Apache Beam SDK is a big contributor to those :).
>
> Airflow "core" itself has far less dependencies
> https://github.com/apache/airflow/blob/constraints-main/constraints-no-providers-3.9.txt
> (175 currently) and we actively made sure that all "pandas" of this world
> are only optional extra deps.
>
> Now - the interesting thing is that we use "constraints'' (the links you
> with dependencies that I posted are those constraints) to pin versions of
> the dependencies that are "golden" - i.e. we test those continuously in our
> CI and we automatically upgrade the constraints when all the unit and
> integration tests pass.
> There is a little bit of complexity and sometimes conflicts to handle (as
> `pip` has to find the right set of deps that will work for all our optional
> extras), but eventually we have really one "golden" set of constraints at
> any moment in time main (or v2-x branch - we have a separate set for each
> branch) that we are dealing with. And this is the only "set" of dependency
> versions that Airflow gets tested with. Note - these are *constraints *not
> *requirements *- that makes a whole world of difference.
>
> Then when we release airflow, we "freeze" the constraints with the version
> tag. We know they work because all our tests pass with them in CI.
>
> Then we communicate to our users (and we use it in our Docker image) that
> the only "supported" way of installing airflow is with using `pip` and
> constraints
> https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html.
> And we do not support poetry, pipenv - we leave it up to users to handle
> them (until poetry/pipenv will support constraints - which we are waiting
> for and there is an issue where I explained  why it is useful). It looks
> like that `pip install "apache-airflow==2.3.4" --constraint "
> https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.9.txt"`
> (different constraints for different airflow version and Python version you
> have)
>
> Constraints have this nice feature that they are only used during the "pip
> install" phase and thrown out immediately after the install is complete.
> They do not create "hard" requirements for airflow. Airflow still has a
> number of "lower-bound" limits for a number of constraints but we try to
> avoid putting upper-bounds at all (only in specific cases and documenting
> them) and our bounds are rather relaxed. This way we achieve two things:
>
> 1) when someone does not use constraints and has a problem with broken
> dependency - we tell them to use constraints - this is what we as a
> community commit to and support
> 2) but by using constraints mechanism we do not limit our users if they
> want to upgrade or downgrade any dependencies. They are free to do it (as
> long as it fits the - rather relaxed lower/upper bounds of Airflow). But
> "with great powers come great responsibilities" - if they want to do that.,
> THEY have to make sure that airflow will work. We make no guarantees there.
> 3) we are not limited by the 3rd-party libraries that come as extras - if
> you do not use those, the limits do not apply
>
> I think this 

Re: [DISCUSS] Dependency management in Apache Beam Python SDK

2022-08-26 Thread Jarek Potiuk
>
> I'm curious Jarek, does Airflow take any dependencies on popular libraries
> like pandas, numpy, pyarrow, scipy, etc... which users are likely to have
> their own dependency on? I think these dependencies are challenging in a
> different way than the client libraries - ideally we would support a wide
> version range so as not to require users to upgrade those libraries in
> lockstep with Beam. However in some cases our dependency is pretty tight
> (e.g. the DataFrame API's dependency on pandas), so we need to make sure to
> explicitly test with multiple different versions. Does Airflow have any
> similar issues?
>

Yes we do (all of those I think :) ). Complete set of all our deps can be
found here
https://github.com/apache/airflow/blob/constraints-main/constraints-3.9.txt
(continuously updated and we have different sets for different python
versions).

We took a rather interesting and unusual approach (more details in my talk)
- mainly because Airflow is both an application to install (for users) and
library to use (for DAG authors) and both have contradicting expectations
(installation stability versus flexibility in upgrading/downgrading
dependencies). Our approach is really smart in making sure water and fire
play well with each other.

Most of those dependencies are coming from optional extras (list of all
extras here:
https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html).
More often than not the "problematic" dependencies you mention are
transitive dependencies through some client libraries we use (for example
Apache Beam SDK is a big contributor to those :).

Airflow "core" itself has far less dependencies
https://github.com/apache/airflow/blob/constraints-main/constraints-no-providers-3.9.txt
(175 currently) and we actively made sure that all "pandas" of this world
are only optional extra deps.

Now - the interesting thing is that we use "constraints'' (the links you
with dependencies that I posted are those constraints) to pin versions of
the dependencies that are "golden" - i.e. we test those continuously in our
CI and we automatically upgrade the constraints when all the unit and
integration tests pass.
There is a little bit of complexity and sometimes conflicts to handle (as
`pip` has to find the right set of deps that will work for all our optional
extras), but eventually we have really one "golden" set of constraints at
any moment in time main (or v2-x branch - we have a separate set for each
branch) that we are dealing with. And this is the only "set" of dependency
versions that Airflow gets tested with. Note - these are *constraints
*not *requirements
*- that makes a whole world of difference.

Then when we release airflow, we "freeze" the constraints with the version
tag. We know they work because all our tests pass with them in CI.

Then we communicate to our users (and we use it in our Docker image) that
the only "supported" way of installing airflow is with using `pip` and
constraints
https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html.
And we do not support poetry, pipenv - we leave it up to users to handle
them (until poetry/pipenv will support constraints - which we are waiting
for and there is an issue where I explained  why it is useful). It looks
like that `pip install "apache-airflow==2.3.4" --constraint "
https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.9.txt"`
(different constraints for different airflow version and Python version you
have)

Constraints have this nice feature that they are only used during the "pip
install" phase and thrown out immediately after the install is complete.
They do not create "hard" requirements for airflow. Airflow still has a
number of "lower-bound" limits for a number of constraints but we try to
avoid putting upper-bounds at all (only in specific cases and documenting
them) and our bounds are rather relaxed. This way we achieve two things:

1) when someone does not use constraints and has a problem with broken
dependency - we tell them to use constraints - this is what we as a
community commit to and support
2) but by using constraints mechanism we do not limit our users if they
want to upgrade or downgrade any dependencies. They are free to do it (as
long as it fits the - rather relaxed lower/upper bounds of Airflow). But
"with great powers come great responsibilities" - if they want to do that.,
THEY have to make sure that airflow will work. We make no guarantees there.
3) we are not limited by the 3rd-party libraries that come as extras - if
you do not use those, the limits do not apply

I think this works really well - but it is rather complex to setup and
maintain - I built a whole complex set of scripts and I have the whole
`breeze` ("It's a breeze to develop airflow" is the theme) development/CI
environment based on docker and docker-compose that allows us to automate
all of that.

J.


Keynote Speaker for a Stream processing workshop at IEEE Big data 2022 Osaka, JP

2022-08-26 Thread Sabri Skhiri
Dear all,

I am not sure, I am on the right mailing list for this request, if not,
don't hesitate to inform me what would be the best course of action.

I am the co-chair of the 7th workshop on real-time stream analytics
 workshop, colocated with the IEEE
Big data conference  (OSAKA, JP). I
am organizing this workshop with Albert Bifet 
 and Alessandro Margara
(Politecnico
di Milano).

We are looking for keynote speakers for the workshop. In the previous
editions we had speaker from Apache Spark (Databricks), Apache Flink
(ververica), Apache Kafka (Confluent), Apache Pulsar (Splunk).

This year we wanted to have a talk on Google data flow or on Apache Beam.
Is there anyone interested in giving such a keynote presentation?
It is usually a 45 min talk with 10 mins for QA.

The event is likely to be in hybrid mode, as a result we could plan either
in person  or remote presentation.

Thank you for your help !

Regards,

Sabri.


Beam High Priority Issue Report (69)

2022-08-26 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22779 [Bug]: SpannerIO.readChangeStream() 
stops forwarding change records and starts continuously throwing (large number) 
of Operation ongoing errors 
https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update 
causes Invisible parameter type error
https://github.com/apache/beam/issues/22743 [Bug]: Test flake: 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time