Fwd: [Question] Python Batch Pipeline Errors

2021-06-03 Thread Shankar Mane
-- Forwarded message -
From: Shankar Mane 
Date: Thu, 3 Jun 2021, 19:22
Subject: [Question] Python Batch Pipeline Errors
To: 


#

*BATCH PIPELINE : *

python3 batch.py \
--input beam-userbase.csv \
--output output/batch \
--runner=SparkRunner \
--spark_submit_uber_jar \
--job_endpoint=localhost:8099 \
--spark_master_url=spark://ip-10-51-3-xxx:7077 \
--spark_rest_url=http://iip-10-51-3-xxx:6066 \
--environment_cache_millis=30 \
--environment_type=DOCKER \
--environment_config="apache/beam_python3.8_sdk:2.29.0"


*OR*

/home/ec2-user/spark-3.1.1-bin-hadoop2.7/bin/spark-submit \
--master spark://ip-10-51-3-xxx:7077 \
/home/ec2-user/apache-beam-py/batch.py \
--runner=PortableRunner \
--spark_submit_uber_jar \
--job_endpoint=localhost:8099 \
--spark_job_server_jar=beam-runners-spark-job-server-2.29.0.jar \
--input beam-userbase.csv \
--output output/batch


#


*EXCEPTIONS :  in both the above cases, i am getting below errors: *
2021/06/03 13:00:12 Initializing python harness: /opt/apache/beam/boot
--id=1-1 --provision_endpoint=localhost:37997
2021/06/03 13:00:20 Failed to retrieve staged files: failed to retrieve
/tmp/staged in 3 attempts: failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/pickled_main_session
caused by:
rpc error: code = Unknown desc =
21/06/03 18:30:22 WARN BlockManager: Putting block rdd_13_1 failed due to
exception
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.IllegalStateException: No container running for id
1e052a63b1a31f08c319c39fcea39f227d6a3b768a90d36ffa90f284180e5e73.
21/06/03 18:30:22 WARN BlockManager: Block rdd_13_1 could not be removed as
it was not found on disk or in memory
21/06/03 18:30:22 WARN BlockManager: Putting block rdd_17_1 failed due to
exception
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.IllegalStateException: No container running for id
1e052a63b1a31f08c319c39fcea39f227d6a3b768a90d36ffa90f284180e5e73.
21/06/03 18:30:22 WARN BlockManager: Block rdd_17_1 could not be removed as
it was not found on disk or in memory
21/06/03 18:30:22 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.IllegalStateException: No container running for id
1e052a63b1a31f08c319c39fcea39f227d6a3b768a90d36ffa90f284180e5e73
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
at
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:451)
at
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:436)
at
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:303)
at
org.apache.beam.runners.fnexecution.control.DefaultExecutableStageContext.getStageBundleFactory(DefaultExecutableStageContext.java:38)
at
org.apache.beam.runners.fnexecution.control.ReferenceCountingExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingExecutableStageContextFactory.java:202)
at
org.apache.beam.runners.spark.translation.SparkExecutableStageFunction.call(SparkExecutableStageFunction.java:135)
at
org.apache.beam.runners.spark.translation.SparkExecutableStageFunction.call(SparkExecutableStageFunction.java:80)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at 

Re: Allyship workshops for open source contributors

2021-06-03 Thread Ratnakar Malla
+1



From: Austin Bennett 
Sent: Thursday, June 3, 2021 6:20:25 PM
To: u...@beam.apache.org 
Cc: dev 
Subject: Re: Allyship workshops for open source contributors

+1, assuming timing can work.

On Wed, Jun 2, 2021 at 2:07 PM Aizhamal Nurmamat kyzy 
mailto:aizha...@apache.org>> wrote:
If we have a good number of people who express interest in this thread, I will 
set up training for the Airflow community.

I meant Beam ^^' I am organizing it for the Airflow community as well.


Re: Lots of VMs

2021-06-03 Thread Ahmet Altay
Update: I stopped beam-jenkins-clang-format and new-ci-node-test. Other
instances from my first email were already stopped. Thank you for doing
that.

And if you need any one of these instances to run, feel free to start them
again.

Ahmet

On Thu, May 27, 2021 at 4:54 PM Ahmet Altay  wrote:

> Folks, I will stop the VMs I listed in my first email sometime next week.
> Feel free to resume them if you need them.
>
> And if you have any other unused VMs please stop or delete them.
>
> Thank you,
> Ahmet
>
> On Mon, May 10, 2021 at 10:08 AM Ahmet Altay  wrote:
>
>> That is a good suggestion. I can do that sometime after 2 weeks for the
>> VMs I listed. There are close to 200 VMs though, there might be more unused
>> or underused VMs.
>>
>> On Mon, May 10, 2021 at 9:22 AM Brian Hulette 
>> wrote:
>>
>>> Perhaps we should give people a week or two to justify keeping these
>>> online and if we don't hear anything go ahead and shut them down?
>>>
>>> Brian
>>>
>>> On Fri, May 7, 2021 at 5:52 PM Ahmet Altay  wrote:
>>>
 Hello,

 It looks like we have accumulated a bunch of running VMs with very low
 utilization. Some of them have temp, test etc. in their names and probably
 no longer needed. If you started a one time use only VM and no longer need
 it could you stop or delete those?

 A few that could be potentially deleted:
 beam-jenkins-clang-format
 new-ci-node-test
 temporary-jenkins-node-tmp-cleanup
 tmp-jenkins-node
 temporal-beam-jenkins-1
 temporal-beam-jenkins-2
 temporary-jenkins-node-tmp-cleanup

 List is here:
 http://console.cloud.google.com/compute/instances?project=apache-beam-testing=(%22instances%22:(%22s%22:%5B(%22i%22:%22recommendationSortKey%22,%22s%22:%220%22),(%22i%22:%22name%22,%22s%22:%220%22)%5D,%22p%22:0))

 Thank you!
 Ahmet

>>>


Re: [VOTE] Release 2.30.0, release candidate #1

2021-06-03 Thread Robert Bradshaw
+1 (binding)

Verified the signatures are all good and the source tarball matches github.

On Thu, Jun 3, 2021 at 3:38 PM Ahmet Altay  wrote:
>
> +1 (binding) - I ran python quickstart examples on the direct runner.
>
> Thank you for preparing the RC!
>
> Ahmet
>
> On Thu, Jun 3, 2021 at 2:58 PM Chamikara Jayalath  
> wrote:
>>
>> +1 (binding)
>>
>> Tested some Java quickstart validations and multi-language pipelines.
>>
>> Thanks,
>> Cham
>>
>> On Thu, Jun 3, 2021 at 2:03 PM Tomo Suzuki  wrote:
>>>
>>> +1 (non-binding)
>>>
>>> Thank you for the preparation. With the GCP dependencies of my interest, 
>>> the GitHub checks worked.
>>>
>>>
>>>
>>> On Thu, Jun 3, 2021 at 4:55 AM Heejong Lee  wrote:

 Hi everyone,

 Please review and vote on the release candidate #1 for the version 2.30.0, 
 as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)

 Reviewers are encouraged to test their own use cases with the release 
 candidate, and vote +1 if no issues are found.

 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org 
 [2], which is signed with the key with fingerprint 
 DBC03F1CCF4240FBD0F256F054550BE0F4C0A24D [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.30.0-RC1" [5],
 * website pull request listing the release [6], publishing the API 
 reference manual [7], and the blog post [8].
 * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0_292.
 * Python artifacts are deployed along with the source release to the 
 dist.apache.org [2].
 * Validation sheet with a tab for 2.30.0 release to help with validation 
 [9].
 * Docker images published to Docker Hub [10].
 * Python artifacts are published to pypi as a pre-release version [11].

 The vote will be open for at least 72 hours. It is adopted by majority 
 approval, with at least 3 PMC affirmative votes.

 Thanks,
 Heejong

 [1] 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349978
 [2] https://dist.apache.org/repos/dist/dev/beam/2.30.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4] https://repository.apache.org/content/repositories/orgapachebeam-1174/
 [5] https://github.com/apache/beam/tree/v2.30.0-RC1
 [6] https://github.com/apache/beam/pull/14894
 [7] https://github.com/apache/beam-site/pull/613
 [8] https://github.com/apache/beam/pull/14895
 [9] 
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=109662250
 [10] https://hub.docker.com/search?q=apache%2Fbeam=image
 [11] https://pypi.org/project/apache-beam/2.30.0rc1/
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tomo


Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-06-03 Thread Ke Wu
There is also a talk [1] which introduces dynamic scaling a stream processing 
job at LinkedIn with Samza runner as well

[1] https://www.usenix.org/conference/hotcloud20/presentation/singh 
 

> On Jun 3, 2021, at 1:59 PM, Ke Wu  wrote:
> 
> Also, are there any resources I can use to find out more about how horizontal 
> scaling works in Samza?
> 
> It is a configuration [1] passed along with job submission, then Job 
> Coordinator, similar to Job Manager in Flink, asks for Yarn Resource Manager 
> to allocate containers, or Kubernetes API server to allocate Pods. 
> 
> configure the job to use the PROCESS environment.
> 
> Agreed that a custom image with fat jar inside + PROCESS environment works 
> too, we prefer EXTERNAL environment because it gives us isolation between the 
> runner and sdk worker, where the runner container can be running completely 
> based on a framework image.
> 
>  
> 
> [1] 
> https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html#job-container-count
>  
> 
>  
> 
> On Thu, Jun 3, 2021 at 12:45 PM Kyle Weaver  > wrote:
> However, not all runners follow the pattern where a predefined number of 
> workers are brought up before job submission, for example, for Samza runner, 
> the number of workers needed for a job is determined after the job submission 
> happens, in which case, in the Samza worker Pod, which is similar to “Task 
> Manager Pod” in Flink, is brought up together after job submission and the 
> runner container in this POD need to connect to worker pool service at much 
> earlier time.
>  
> Makes sense. In that case, the best alternative to worker pools is probably 
> to create a custom Samza/Flink worker container image that includes whatever 
> dependencies necessary to run the Beam user code, and then configure the job 
> to use the PROCESS environment.
> 
> Also, are there any resources I can use to find out more about how horizontal 
> scaling works in Samza?
> 
> On Wed, Jun 2, 2021 at 6:39 PM Ke Wu  > wrote:
> Very good point. We are actually talking about the same high level approach 
> where Task Manager Pod has two containers inside running, one is task manager 
> container while the other is worker pool service container.
> 
> I believe the disconnect probably lies in how a job is being 
> deployed/started. In the GCP Flink operator example, it is completely true 
> that the likelihood where the worker pool service is not available when the 
> task manager container needs to connect to it is extremely low. It is because 
> the worker pool service is being brought up together when the Flink cluster 
> is being brought up, which is before the job submission even happens.
> 
> However, not all runners follow the pattern where a predefined number of 
> workers are brought up before job submission, for example, for Samza runner, 
> the number of workers needed for a job is determined after the job submission 
> happens, in which case, in the Samza worker Pod, which is similar to “Task 
> Manager Pod” in Flink, is brought up together after job submission and the 
> runner container in this POD need to connect to worker pool service at much 
> earlier time.
> 
> In addition, if I understand correctly, Flink is planning to add support for 
> dynamically adding new task managers after job submission [1], in which case, 
> the task manager container and worker pool service container in the same Task 
> Manager Pod could be started together and the task manager container need to 
> connect to the worker pool service container sooner. 
> 
> Hope this clarifies things better. Let me know if you have more questions.
> 
> Best,
> Ke
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10407 
>  
> 
>> On Jun 2, 2021, at 4:27 PM, Kyle Weaver > > wrote:
>> 
>> Therefore, if we bring up the external worker pool container together with 
>> the runner container, which is one the supported approach by Flink Runner on 
>> K8s
>> 
>> Exactly which approach are you talking about in the doc? I feel like there 
>> could be some misunderstanding here. Here is the configuration I'm talking 
>> about: 
>> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/examples/beam/without_job_server/beam_flink_cluster.yaml
>>  
>> 
>> 
>> Basically this config is describing a Flink task manager with a Beam worker 
>> pool sidecar. The user invokes it with:
>> 
>> kubectl apply -f examples/beam/without_job_server/beam_flink_cluster.yaml
>> 
>> It doesn't matter which container is started first, the task manager 
>> 

Re: [VOTE] Release 2.30.0, release candidate #1

2021-06-03 Thread Ahmet Altay
+1 (binding) - I ran python quickstart examples on the direct runner.

Thank you for preparing the RC!

Ahmet

On Thu, Jun 3, 2021 at 2:58 PM Chamikara Jayalath 
wrote:

> +1 (binding)
>
> Tested some Java quickstart validations and multi-language pipelines.
>
> Thanks,
> Cham
>
> On Thu, Jun 3, 2021 at 2:03 PM Tomo Suzuki  wrote:
>
>> +1 (non-binding)
>>
>> Thank you for the preparation. With the GCP dependencies of my interest,
>> the GitHub checks worked.
>>
>>
>>
>> On Thu, Jun 3, 2021 at 4:55 AM Heejong Lee  wrote:
>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> 2.30.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found.
>>>
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint
>>> DBC03F1CCF4240FBD0F256F054550BE0F4C0A24D [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.30.0-RC1" [5],
>>> * website pull request listing the release [6], publishing the API
>>> reference manual [7], and the blog post [8].
>>> * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0_292.
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>> * Validation sheet with a tab for 2.30.0 release to help with validation
>>> [9].
>>> * Docker images published to Docker Hub [10].
>>> * Python artifacts are published to pypi as a pre-release version [11].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Heejong
>>>
>>> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349978
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.30.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1174/
>>> [5] https://github.com/apache/beam/tree/v2.30.0-RC1
>>> [6] https://github.com/apache/beam/pull/14894
>>> [7] https://github.com/apache/beam-site/pull/613
>>> [8] https://github.com/apache/beam/pull/14895
>>> [9]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=109662250
>>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> [11] https://pypi.org/project/apache-beam/2.30.0rc1/
>>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>


Re: Beam Website Feedback

2021-06-03 Thread Chamikara Jayalath
Makes sense to add it. Will you be able to send a pull request
?

Thanks,
Cham

On Tue, Jun 1, 2021 at 6:37 PM Matt Hall  wrote:

> Hello,
>
> Would like to suggest that mention of the new storage write api method [1]
> be added under "setting the insertion method" [2].  Additional
> documentation for the new storage write api is linked-to below [3] and from
> the page describing the tabledata.insertall method [4].
>
> [1]
> https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html
> [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/
> [3] https://cloud.google.com/bigquery/docs/write-api
> [4] https://cloud.google.com/bigquery/streaming-data-into-bigquery
>
> --
> Matt Hall | Technical Solutions Engineer | matthew...@google.com | +1
> 425-655-4100 <(425)%20655-4100>
>
>


Re: [VOTE] Release 2.30.0, release candidate #1

2021-06-03 Thread Chamikara Jayalath
+1 (binding)

Tested some Java quickstart validations and multi-language pipelines.

Thanks,
Cham

On Thu, Jun 3, 2021 at 2:03 PM Tomo Suzuki  wrote:

> +1 (non-binding)
>
> Thank you for the preparation. With the GCP dependencies of my interest,
> the GitHub checks worked.
>
>
>
> On Thu, Jun 3, 2021 at 4:55 AM Heejong Lee  wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the version
>> 2.30.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found.
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint
>> DBC03F1CCF4240FBD0F256F054550BE0F4C0A24D [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.30.0-RC1" [5],
>> * website pull request listing the release [6], publishing the API
>> reference manual [7], and the blog post [8].
>> * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0_292.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.30.0 release to help with validation
>> [9].
>> * Docker images published to Docker Hub [10].
>> * Python artifacts are published to pypi as a pre-release version [11].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Heejong
>>
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349978
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.30.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1174/
>> [5] https://github.com/apache/beam/tree/v2.30.0-RC1
>> [6] https://github.com/apache/beam/pull/14894
>> [7] https://github.com/apache/beam-site/pull/613
>> [8] https://github.com/apache/beam/pull/14895
>> [9]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=109662250
>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>> [11] https://pypi.org/project/apache-beam/2.30.0rc1/
>>
>
>
> --
> Regards,
> Tomo
>


Re: [Question] Best Practice of Handling Null Key for KafkaRecordCoder

2021-06-03 Thread Chamikara Jayalath
I think for that and for any transform that produces a PCollection of a
type that is not represented by a standard coder, we would have to add a
cross-language builder class that returns a PCollection that can be
supported at the cross-language boundary. For example, it can be a
PCollection since RowCoder is already a standard coder. I haven't
looked closely into fixing this particular Jira though.

Thanks,
Cham

On Thu, Jun 3, 2021 at 2:31 PM Boyuan Zhang  wrote:

> Considering the problem of populating KafkaRecord metadata(BEAM-12076
> )
> together, what's the plan there? Are we going to make KafkaRecordCoder as a
> well-known coder as well? The reason why I ask is because it might be a
> good chance to revisit the KafkaRecordCoder implementation.
>
> On Thu, Jun 3, 2021 at 2:17 PM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Thu, Jun 3, 2021 at 2:06 PM Boyuan Zhang  wrote:
>>
>>> Supporting the x-lang boundary is a good point. So you are suggesting
>>> that:
>>>
>>>1. We make NullableCoder as a standard coder.
>>>2. KafkaIO wraps the keyCoder with NullabeCoder directly if it
>>>requires.
>>>
>>> Is that correct?
>>>
>>
>> Yeah.
>>
>>
>>>
>>>
>>> On Wed, Jun 2, 2021 at 6:47 PM Chamikara Jayalath 
>>> wrote:
>>>
 I think we should make NullableCoder a standard coder for Beam [1] and
 use a standard Nullablecoder(KeyCoder) for Kafka keys (where KeyCoder might
 be the standard ByteArrayCoder for example)
 I think we have compatible Java and Python NullableCoder
 implementations already so implementing this should be relatively
 straightforward.

 Non-standard coders may not be supported by runners at the
 cross-language boundary.

 Thanks,
 Cham

 [1]
 https://github.com/apache/beam/blob/d0c3dd72874fada03cc601f30cde022a8dd6aa9c/model/pipeline/src/main/proto/beam_runner_api.proto#L784

 On Wed, Jun 2, 2021 at 6:25 PM Ahmet Altay  wrote:

> /cc folks who commented on the issue: @Robin Qiu  
> @Chamikara
> Jayalath  @Alexey Romanenko
>  @Daniel Collins 
>
> On Tue, Jun 1, 2021 at 2:03 PM Weiwen Xu  wrote:
>
>> Hello,
>>
>> I'm working on [this issue](
>> https://issues.apache.org/jira/browse/BEAM-12008) with Boyuan. She
>> was very helpful in identifying the issue which is that KafkaRecordCoder
>> couldn't handle the case when key is null.
>>
>> We came out with two potential solutions. Yet both have its pros and
>> cons so I'm hoping to gather some suggestions/opinions or ideas of how to
>> handle this issue. For our solutions:
>>
>> 1. directly wrapping the keyCoder with Nullablecoder i.e.
>> NullableCoder.of(keyCoder)
>> cons: backwards compatibility problem
>>
>> 2. writing a completely new class named something like
>> NullableKeyKafkaRecordCoder
>> instead of using KVCoder and encode/decode KVs, we have KeyCoder
>> and ValueCoder as fields and another BooleanCoder to encode/decode T/F 
>> for
>> present of null key. If key is null, KeyCoder will not encode/decode.
>>
>>   - [L63] encode(...){
>>stringCoder.encode(topic, ...);
>>intCoder.encode(partition, ...);
>>longCoder.encode(offset, ...);
>>longCoder.encode(timestamp, ...);
>>intCoder.encode(timestamptype, ...);
>>headerCoder.encode(...)
>>if(Key!=null){
>>   BooleanCoder.encode(false, ...);
>>   KeyCoder.encode(key, ...);
>>}else{
>>   BooleanCoder.encode(true, ...);
>>   // skips KeyCoder when key is null
>>}
>>   ValueCoder.encode(value, ...);
>> }
>>
>>   - [L74] decode(...){
>>   return new KafkaRecord<>(
>>
>> stringCoder.decode(inStream),
>> intCoder.decode(inStream),
>>
>> longCoder.decode(inStream),
>>
>> longCoder.decode(inStream),
>>
>> KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
>> (Headers)
>> toHeaders(headerCoder.decode(inStream)),
>>
>> BooleanCoder.decode(inStream)? null:KeyCoder.decode(inStream),
>>
>> ValueCoder.decode(inStream)
>> );
>> }
>>
>> Best regards,
>> Weiwen
>>
>


Re: [Question] Best Practice of Handling Null Key for KafkaRecordCoder

2021-06-03 Thread Boyuan Zhang
Considering the problem of populating KafkaRecord metadata(BEAM-12076
) together,
what's the plan there? Are we going to make KafkaRecordCoder as a
well-known coder as well? The reason why I ask is because it might be a
good chance to revisit the KafkaRecordCoder implementation.

On Thu, Jun 3, 2021 at 2:17 PM Chamikara Jayalath 
wrote:

>
>
> On Thu, Jun 3, 2021 at 2:06 PM Boyuan Zhang  wrote:
>
>> Supporting the x-lang boundary is a good point. So you are suggesting
>> that:
>>
>>1. We make NullableCoder as a standard coder.
>>2. KafkaIO wraps the keyCoder with NullabeCoder directly if it
>>requires.
>>
>> Is that correct?
>>
>
> Yeah.
>
>
>>
>>
>> On Wed, Jun 2, 2021 at 6:47 PM Chamikara Jayalath 
>> wrote:
>>
>>> I think we should make NullableCoder a standard coder for Beam [1] and
>>> use a standard Nullablecoder(KeyCoder) for Kafka keys (where KeyCoder might
>>> be the standard ByteArrayCoder for example)
>>> I think we have compatible Java and Python NullableCoder implementations
>>> already so implementing this should be relatively straightforward.
>>>
>>> Non-standard coders may not be supported by runners at the
>>> cross-language boundary.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/d0c3dd72874fada03cc601f30cde022a8dd6aa9c/model/pipeline/src/main/proto/beam_runner_api.proto#L784
>>>
>>> On Wed, Jun 2, 2021 at 6:25 PM Ahmet Altay  wrote:
>>>
 /cc folks who commented on the issue: @Robin Qiu  
 @Chamikara
 Jayalath  @Alexey Romanenko
  @Daniel Collins 

 On Tue, Jun 1, 2021 at 2:03 PM Weiwen Xu  wrote:

> Hello,
>
> I'm working on [this issue](
> https://issues.apache.org/jira/browse/BEAM-12008) with Boyuan. She
> was very helpful in identifying the issue which is that KafkaRecordCoder
> couldn't handle the case when key is null.
>
> We came out with two potential solutions. Yet both have its pros and
> cons so I'm hoping to gather some suggestions/opinions or ideas of how to
> handle this issue. For our solutions:
>
> 1. directly wrapping the keyCoder with Nullablecoder i.e.
> NullableCoder.of(keyCoder)
> cons: backwards compatibility problem
>
> 2. writing a completely new class named something like
> NullableKeyKafkaRecordCoder
> instead of using KVCoder and encode/decode KVs, we have KeyCoder
> and ValueCoder as fields and another BooleanCoder to encode/decode T/F for
> present of null key. If key is null, KeyCoder will not encode/decode.
>
>   - [L63] encode(...){
>stringCoder.encode(topic, ...);
>intCoder.encode(partition, ...);
>longCoder.encode(offset, ...);
>longCoder.encode(timestamp, ...);
>intCoder.encode(timestamptype, ...);
>headerCoder.encode(...)
>if(Key!=null){
>   BooleanCoder.encode(false, ...);
>   KeyCoder.encode(key, ...);
>}else{
>   BooleanCoder.encode(true, ...);
>   // skips KeyCoder when key is null
>}
>   ValueCoder.encode(value, ...);
> }
>
>   - [L74] decode(...){
>   return new KafkaRecord<>(
>
> stringCoder.decode(inStream),
> intCoder.decode(inStream),
> longCoder.decode(inStream),
> longCoder.decode(inStream),
>
> KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
> (Headers)
> toHeaders(headerCoder.decode(inStream)),
>
> BooleanCoder.decode(inStream)? null:KeyCoder.decode(inStream),
> ValueCoder.decode(inStream)
> );
> }
>
> Best regards,
> Weiwen
>



Re: KafkaIO SSL issue

2021-06-03 Thread Brian Hulette
Oh I guess you are running the KafkaToPubsub example pipeline [1]? The code
you copied is here [2].
Based on that code I think Kafka is in control of creating the InputStream,
since you're just passing a file path in through the config object. So
either Kafka is creating a bad InputStream (seems unlikely), or there's
something wrong with /tmp/kafka.truststore.jks. Maybe it was cleaned up
while Kafka is reading it, or the file is somehow corrupt?

Looking at the code you copied, I suppose it's possible you're not writing
the full file to the local disk. The javadoc for transferFrom [3] states:

> Fewer than the requested number of bytes will be transferred if the
source channel has fewer than count bytes remaining, ** or if the source
channel is non-blocking and has fewer than count bytes immediately
available in its input buffer. **

Is it possible sometimes you're hitting this second case and the whole file
isn't being read? I don't know if readerChannel is blocking or not. You
might check by adding a log statement that prints the number of bytes that
are transferred to see if that correlates with the failure.

Someone else on this list may have advice on a more robust way to copy a
file from remote storage.

Brian

[1]
https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/
[2]
https://github.com/apache/beam/blob/f881a412fe85c64b1caf075160a6c0312995ea45/examples/java/src/main/java/org/apache/beam/examples/complete/kafkatopubsub/kafka/consumer/SslConsumerFactoryFn.java#L128
[3]
https://docs.oracle.com/javase/8/docs/api/java/nio/channels/FileChannel.html#transferFrom-java.nio.channels.ReadableByteChannel-long-long-

On Wed, Jun 2, 2021 at 9:00 AM Ilya Kozyrev 
wrote:

> Hi Brain,
>
>
>
> We’re using consumerFactoryFn that reads certs from GCP and copying those
> to local FS on each Dataflow worker.
>
> Exception raised after consumerFactoryFn when Kafka tries to read certs
> from local fs using KeyStore.load(InputStream is, String pass).
>
>
>
> This code we using in consumerFactoryFn to read from GCP and writing to
> local fs :
>
> try (ReadableByteChannel readerChannel =
>
>
>   
> FileSystems.open(FileSystems.matchSingleFileSpec(gcsFilePath).resourceId()))
> {
>
> try (FileChannel writeChannel =
> FileChannel.open(Paths.get(outputFilePath), options)) {
>
>   writeChannel.transferFrom(readerChannel, 0, Long.MAX_VALUE);
>
> }
>
> }
>
>
>
> Thank you,
>
> Ilya
>
>
>
> *From: *Brian Hulette 
> *Reply to: *"dev@beam.apache.org" 
> *Date: *Wednesday, 26 May 2021, 21:32
> *To: *dev 
> *Cc: *Artur Khanin 
> *Subject: *Re: KafkaIO SSL issue
>
>
>
> I came across this relevant StackOverflow question:
> https://stackoverflow.com/questions/7399154/pkcs12-derinputstream-getlength-exception
>
> They say the error is from a call to `KeyStore.load(InputStream is,
> String pass);` (consistent with your stacktrace), and can occur whenever
> there's an issue with the InputStream passed to it. Who created the
> InputStream used in this case, is it Kafka code, Beam code, or your
> consumerFactoryFn?
>
>
>
> Brian
>
>
>
> On Mon, May 24, 2021 at 4:01 AM Ilya Kozyrev 
> wrote:
>
> Hi community,
>
>
> We have an issue with KafkaIO in the case of using a secure connection
> SASL SSL to the Confluent Kafka 5.5.1. When we trying to configure the
> Kafka consumer using consumerFactoryFn, we have an irregular issue related
> to certificate reads from the file system. Irregular means, that different
> Dataflow jobs with the same parameters and certs might be failed and
> succeeded. Store cert types for Keystore and Truststore are specified
> explicitly in consumer config. In our case, it's JKS for both certs.
>
>
> *Stacktrase*:
>
> Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL
> keystore /tmp/kafka.truststore.jks of type JKS
>   at
> org.apache.kafka.common.security.ssl.SslEngineBuilder$SecurityStore.load(SslEngineBuilder.java:289)
>   at
> org.apache.kafka.common.security.ssl.SslEngineBuilder.createSSLContext(SslEngineBuilder.java:153)
>   ... 23 more
> Caused by: java.security.cert.CertificateException: Unable to initialize,
> java.io.IOException: DerInputStream.getLength(): lengthTag=65, too big.
>   at sun.security.x509.X509CertImpl.(X509CertImpl.java:198)
>   at
> sun.security.provider.X509Factory.engineGenerateCertificate(X509Factory.java:102)
>   at
> java.security.cert.CertificateFactory.generateCertificate(CertificateFactory.java:339)
>   at
> sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:755)
>   at
> sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56)
>   at
> sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224)
>   at
> sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70)
>   at java.security.KeyStore.load(KeyStore.java:1445)
>   at
> 

Re: [Question] Best Practice of Handling Null Key for KafkaRecordCoder

2021-06-03 Thread Chamikara Jayalath
On Thu, Jun 3, 2021 at 2:06 PM Boyuan Zhang  wrote:

> Supporting the x-lang boundary is a good point. So you are suggesting that:
>
>1. We make NullableCoder as a standard coder.
>2. KafkaIO wraps the keyCoder with NullabeCoder directly if it
>requires.
>
> Is that correct?
>

Yeah.


>
>
> On Wed, Jun 2, 2021 at 6:47 PM Chamikara Jayalath 
> wrote:
>
>> I think we should make NullableCoder a standard coder for Beam [1] and
>> use a standard Nullablecoder(KeyCoder) for Kafka keys (where KeyCoder might
>> be the standard ByteArrayCoder for example)
>> I think we have compatible Java and Python NullableCoder implementations
>> already so implementing this should be relatively straightforward.
>>
>> Non-standard coders may not be supported by runners at the cross-language
>> boundary.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://github.com/apache/beam/blob/d0c3dd72874fada03cc601f30cde022a8dd6aa9c/model/pipeline/src/main/proto/beam_runner_api.proto#L784
>>
>> On Wed, Jun 2, 2021 at 6:25 PM Ahmet Altay  wrote:
>>
>>> /cc folks who commented on the issue: @Robin Qiu  
>>> @Chamikara
>>> Jayalath  @Alexey Romanenko
>>>  @Daniel Collins 
>>>
>>> On Tue, Jun 1, 2021 at 2:03 PM Weiwen Xu  wrote:
>>>
 Hello,

 I'm working on [this issue](
 https://issues.apache.org/jira/browse/BEAM-12008) with Boyuan. She was
 very helpful in identifying the issue which is that KafkaRecordCoder
 couldn't handle the case when key is null.

 We came out with two potential solutions. Yet both have its pros and
 cons so I'm hoping to gather some suggestions/opinions or ideas of how to
 handle this issue. For our solutions:

 1. directly wrapping the keyCoder with Nullablecoder i.e.
 NullableCoder.of(keyCoder)
 cons: backwards compatibility problem

 2. writing a completely new class named something like
 NullableKeyKafkaRecordCoder
 instead of using KVCoder and encode/decode KVs, we have KeyCoder
 and ValueCoder as fields and another BooleanCoder to encode/decode T/F for
 present of null key. If key is null, KeyCoder will not encode/decode.

   - [L63] encode(...){
stringCoder.encode(topic, ...);
intCoder.encode(partition, ...);
longCoder.encode(offset, ...);
longCoder.encode(timestamp, ...);
intCoder.encode(timestamptype, ...);
headerCoder.encode(...)
if(Key!=null){
   BooleanCoder.encode(false, ...);
   KeyCoder.encode(key, ...);
}else{
   BooleanCoder.encode(true, ...);
   // skips KeyCoder when key is null
}
   ValueCoder.encode(value, ...);
 }

   - [L74] decode(...){
   return new KafkaRecord<>(

 stringCoder.decode(inStream),
 intCoder.decode(inStream),
 longCoder.decode(inStream),
 longCoder.decode(inStream),

 KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
 (Headers)
 toHeaders(headerCoder.decode(inStream)),

 BooleanCoder.decode(inStream)? null:KeyCoder.decode(inStream),
 ValueCoder.decode(inStream)
 );
 }

 Best regards,
 Weiwen

>>>


Re: [Question] Best Practice of Handling Null Key for KafkaRecordCoder

2021-06-03 Thread Boyuan Zhang
Supporting the x-lang boundary is a good point. So you are suggesting that:

   1. We make NullableCoder as a standard coder.
   2. KafkaIO wraps the keyCoder with NullabeCoder directly if it requires.

Is that correct?


On Wed, Jun 2, 2021 at 6:47 PM Chamikara Jayalath 
wrote:

> I think we should make NullableCoder a standard coder for Beam [1] and use
> a standard Nullablecoder(KeyCoder) for Kafka keys (where KeyCoder might be
> the standard ByteArrayCoder for example)
> I think we have compatible Java and Python NullableCoder implementations
> already so implementing this should be relatively straightforward.
>
> Non-standard coders may not be supported by runners at the cross-language
> boundary.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/d0c3dd72874fada03cc601f30cde022a8dd6aa9c/model/pipeline/src/main/proto/beam_runner_api.proto#L784
>
> On Wed, Jun 2, 2021 at 6:25 PM Ahmet Altay  wrote:
>
>> /cc folks who commented on the issue: @Robin Qiu  
>> @Chamikara
>> Jayalath  @Alexey Romanenko
>>  @Daniel Collins 
>>
>> On Tue, Jun 1, 2021 at 2:03 PM Weiwen Xu  wrote:
>>
>>> Hello,
>>>
>>> I'm working on [this issue](
>>> https://issues.apache.org/jira/browse/BEAM-12008) with Boyuan. She was
>>> very helpful in identifying the issue which is that KafkaRecordCoder
>>> couldn't handle the case when key is null.
>>>
>>> We came out with two potential solutions. Yet both have its pros and
>>> cons so I'm hoping to gather some suggestions/opinions or ideas of how to
>>> handle this issue. For our solutions:
>>>
>>> 1. directly wrapping the keyCoder with Nullablecoder i.e.
>>> NullableCoder.of(keyCoder)
>>> cons: backwards compatibility problem
>>>
>>> 2. writing a completely new class named something like
>>> NullableKeyKafkaRecordCoder
>>> instead of using KVCoder and encode/decode KVs, we have KeyCoder and
>>> ValueCoder as fields and another BooleanCoder to encode/decode T/F for
>>> present of null key. If key is null, KeyCoder will not encode/decode.
>>>
>>>   - [L63] encode(...){
>>>stringCoder.encode(topic, ...);
>>>intCoder.encode(partition, ...);
>>>longCoder.encode(offset, ...);
>>>longCoder.encode(timestamp, ...);
>>>intCoder.encode(timestamptype, ...);
>>>headerCoder.encode(...)
>>>if(Key!=null){
>>>   BooleanCoder.encode(false, ...);
>>>   KeyCoder.encode(key, ...);
>>>}else{
>>>   BooleanCoder.encode(true, ...);
>>>   // skips KeyCoder when key is null
>>>}
>>>   ValueCoder.encode(value, ...);
>>> }
>>>
>>>   - [L74] decode(...){
>>>   return new KafkaRecord<>(
>>> stringCoder.decode(inStream),
>>> intCoder.decode(inStream),
>>> longCoder.decode(inStream),
>>> longCoder.decode(inStream),
>>>
>>> KafkaTimestampType.forOrdinal(intCoder.decode(inStream)),
>>> (Headers)
>>> toHeaders(headerCoder.decode(inStream)),
>>>
>>> BooleanCoder.decode(inStream)? null:KeyCoder.decode(inStream),
>>> ValueCoder.decode(inStream)
>>> );
>>> }
>>>
>>> Best regards,
>>> Weiwen
>>>
>>


Re: [VOTE] Release 2.30.0, release candidate #1

2021-06-03 Thread Tomo Suzuki
+1 (non-binding)

Thank you for the preparation. With the GCP dependencies of my interest,
the GitHub checks worked.



On Thu, Jun 3, 2021 at 4:55 AM Heejong Lee  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 2.30.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found.
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> DBC03F1CCF4240FBD0F256F054550BE0F4C0A24D [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.30.0-RC1" [5],
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0_292.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.30.0 release to help with validation
> [9].
> * Docker images published to Docker Hub [10].
> * Python artifacts are published to pypi as a pre-release version [11].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Heejong
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349978
> [2] https://dist.apache.org/repos/dist/dev/beam/2.30.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1174/
> [5] https://github.com/apache/beam/tree/v2.30.0-RC1
> [6] https://github.com/apache/beam/pull/14894
> [7] https://github.com/apache/beam-site/pull/613
> [8] https://github.com/apache/beam/pull/14895
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=109662250
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
> [11] https://pypi.org/project/apache-beam/2.30.0rc1/
>


-- 
Regards,
Tomo


Re: [DISCUSS] Client SDK/Job Server/Worker Pool Lifecycle Management on Kubernetes

2021-06-03 Thread Kyle Weaver
>
> However, not all runners follow the pattern where a predefined number of
> workers are brought up before job submission, for example, for Samza
> runner, the number of workers needed for a job is determined after the job
> submission happens, in which case, in the Samza worker Pod, which is
> similar to “Task Manager Pod” in Flink, is brought up together after job
> submission and the runner container in this POD need to connect to worker
> pool service at much earlier time.


Makes sense. In that case, the best alternative to worker pools is probably
to create a custom Samza/Flink worker container image that includes
whatever dependencies necessary to run the Beam user code, and then
configure the job to use the PROCESS environment.

Also, are there any resources I can use to find out more about how
horizontal scaling works in Samza?

On Wed, Jun 2, 2021 at 6:39 PM Ke Wu  wrote:

> Very good point. We are actually talking about the same high level
> approach where Task Manager Pod has two containers inside running, one is
> task manager container while the other is worker pool service container.
>
> I believe the disconnect probably lies in how a job is being
> deployed/started. In the GCP Flink operator example, it is completely true
> that the likelihood where the worker pool service is not available when the
> task manager container needs to connect to it is extremely low. It is
> because the worker pool service is being brought up together when the Flink
> cluster is being brought up, which is before the job submission even
> happens.
>
> However, not all runners follow the pattern where a predefined number of
> workers are brought up before job submission, for example, for Samza
> runner, the number of workers needed for a job is determined after the job
> submission happens, in which case, in the Samza worker Pod, which is
> similar to “Task Manager Pod” in Flink, is brought up together after job
> submission and the runner container in this POD need to connect to worker
> pool service at much earlier time.
>
> In addition, if I understand correctly, Flink is planning to add support
> for dynamically adding new task managers after job submission [1], in which
> case, the task manager container and worker pool service container in the
> same Task Manager Pod could be started together and the task manager
> container need to connect to the worker pool service container sooner.
>
> Hope this clarifies things better. Let me know if you have more questions.
>
> Best,
> Ke
>
> [1] https://issues.apache.org/jira/browse/FLINK-10407
>
> On Jun 2, 2021, at 4:27 PM, Kyle Weaver  wrote:
>
> Therefore, if we bring up the external worker pool container together with
>> the runner container, which is one the supported approach by Flink Runner
>> on K8s
>
>
> Exactly which approach are you talking about in the doc? I feel like there
> could be some misunderstanding here. Here is the configuration I'm talking
> about:
> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/examples/beam/without_job_server/beam_flink_cluster.yaml
>
> Basically this config is describing a Flink task manager with a Beam
> worker pool sidecar. The user invokes it with:
>
> kubectl apply -f examples/beam/without_job_server/beam_flink_cluster.yaml
>
> It doesn't matter which container is started first, the task manager
> container or the worker pool sidecar, because no communication between the
> two containers is necessary at this time.
>
> The instructions are to start the cluster first and wait until it is ready
> to submit a job, e.g.:
>
> kubectl apply -f examples/beam/without_job_server/beam_wordcount_py.yaml
>
> The task manager only sends the worker pool requests once it's running a
> job. So for things to go wrong in the way you describe:
>
> 1. The user submits a job, then starts a Flink cluster -- reversing the
> order of steps in the instructions.
> 2. The worker pool sidecar takes way longer to start up than the task
> manager container for some reason.
> 3. The Flink cluster accepts and starts running the job before the worker
> pool sidecar is ready -- I'm not familiar enough with k8s lifecycle
> management or the Flink operator implementation to be sure if this is even
> possible.
>
> I've never seen this happen. But, of course there are plenty of unforeseen
> ways things can go wrong. So I'm not opposed to improving our error
> handling here more generally.
>
>
>


Flaky test issue report (40)

2021-06-03 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake)

These are P1 issues because they have a major negative impact on the community 
and make it hard to determine the quality of the software.

https://issues.apache.org/jira/browse/BEAM-12322: 
FnApiRunnerTestWithGrpcAndMultiWorkers flaky (py precommit) (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12309: 
PubSubIntegrationTest.test_streaming_data_only flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12307: 
PubSubBigQueryIT.test_file_loads flake (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12303: Flake in 
PubSubIntegrationTest.test_streaming_with_attributes (created 2021-05-06)
https://issues.apache.org/jira/browse/BEAM-12293: 
FlinkSavepointTest.testSavepointRestoreLegacy flakes due to 
FlinkJobNotFoundException (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12291: 
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: 
false] is flaky (created 2021-05-05)
https://issues.apache.org/jira/browse/BEAM-12200: 
SamzaStoreStateInternalsTest is flaky (created 2021-04-20)
https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits 
flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13)
https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL 
failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27)
https://issues.apache.org/jira/browse/BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (created 2021-03-18)
https://issues.apache.org/jira/browse/BEAM-11792: Python precommit failed 
(flaked?) installing package  (created 2021-02-10)
https://issues.apache.org/jira/browse/BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (created 2021-01-20)
https://issues.apache.org/jira/browse/BEAM-11662: elasticsearch tests 
failing (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest 
flake: network not found (py38 postcommit) (created 2021-01-19)
https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink 
failing (created 2021-01-15)
https://issues.apache.org/jira/browse/BEAM-11541: 
testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. 
(created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-11540: Linter sometimes flakes 
on apache_beam.dataframe.frames_test (created 2020-12-30)
https://issues.apache.org/jira/browse/BEAM-10995: Java + Universal Local 
Runner: WindowingTest.testWindowPreservation fails (created 2020-09-30)
https://issues.apache.org/jira/browse/BEAM-10987: 
stager_test.py::StagerTest::test_with_main_session flaky on windows py3.6,3.7 
(created 2020-09-29)
https://issues.apache.org/jira/browse/BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (created 2020-09-25)
https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test 
flake: Could not find Flink job  (created 2020-09-23)
https://issues.apache.org/jira/browse/BEAM-10866: 
PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS 
(created 2020-09-09)
https://issues.apache.org/jira/browse/BEAM-10504: Failure / flake in 
ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn (created 
2020-07-15)
https://issues.apache.org/jira/browse/BEAM-10501: 
CheckGrafanaStalenessAlerts and PingGrafanaHttpApi fail with Connection refused 
(created 2020-07-15)
https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: 
ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14)
https://issues.apache.org/jira/browse/BEAM-9649: 
beam_python_mongoio_load_test started failing due to mismatched results 
(created 2020-03-31)
https://issues.apache.org/jira/browse/BEAM-9392: TestStream tests are all 
flaky (created 2020-02-27)
https://issues.apache.org/jira/browse/BEAM-9232: 
BigQueryWriteIntegrationTests is flaky coercing to Unicode (created 2020-01-31)
https://issues.apache.org/jira/browse/BEAM-9119: 
apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest[...].test_large_elements
 is flaky (created 2020-01-14)
https://issues.apache.org/jira/browse/BEAM-8101: Flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for 
Direct, Spark, Flink (created 2019-08-27)
https://issues.apache.org/jira/browse/BEAM-8035: 
[beam_PreCommit_Java_Phrase] [WatchTest.testMultiplePollsWithManyResults]  
Flake: Outputs must be in timestamp order (created 2019-08-22)
https://issues.apache.org/jira/browse/BEAM-7992: Unhandled type_constraint 
in 

P1 issues report (42)

2021-06-03 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake).

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

https://issues.apache.org/jira/browse/BEAM-12443: No Information of failed 
Query in JdbcIO (created 2021-06-02)
https://issues.apache.org/jira/browse/BEAM-12440: Wrong version string for 
legacy Dataflow Java container (created 2021-06-01)
https://issues.apache.org/jira/browse/BEAM-12436: 
[beam_PostCommit_Go_VR_flink| beam_PostCommit_Go_VR_spark] 
[:sdks:go:test:flinkValidatesRunner] Failure summary (created 2021-06-01)
https://issues.apache.org/jira/browse/BEAM-12422: Vendored gRPC 1.36.0 is 
using a log4j version with security issues (created 2021-05-28)
https://issues.apache.org/jira/browse/BEAM-12416: Python Kafka transforms 
are failing due to "No Runner was specified" (created 2021-05-27)
https://issues.apache.org/jira/browse/BEAM-12396: 
beam_PostCommit_XVR_Direct failed (flaked?) (created 2021-05-24)
https://issues.apache.org/jira/browse/BEAM-12389: 
beam_PostCommit_XVR_Dataflow flaky: Expand method not found (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12387: beam_PostCommit_Python* 
timing out (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12386: 
beam_PostCommit_Py_VR_Dataflow(_V2) failing metrics tests (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12380: Go SDK Kafka IO Transform 
implemented via XLang (created 2021-05-21)
https://issues.apache.org/jira/browse/BEAM-12374: Spark postcommit failing 
ResumeFromCheckpointStreamingTest (created 2021-05-20)
https://issues.apache.org/jira/browse/BEAM-12337: Replace invalid UW 
container name for Java SDK (created 2021-05-14)
https://issues.apache.org/jira/browse/BEAM-12320: 
PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL 
PostCommit (created 2021-05-10)
https://issues.apache.org/jira/browse/BEAM-12310: 
beam_PostCommit_Java_DataflowV2 failing (created 2021-05-07)
https://issues.apache.org/jira/browse/BEAM-12279: Implement 
destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04)
https://issues.apache.org/jira/browse/BEAM-12256: 
PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some 
Avro logical types (created 2021-04-29)
https://issues.apache.org/jira/browse/BEAM-12231: 
beam_PostRelease_NightlySnapshot failing (created 2021-04-27)
https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness 
hangs when installing pip packages (created 2021-03-11)
https://issues.apache.org/jira/browse/BEAM-11906: No trigger early 
repeatedly for session windows (created 2021-03-01)
https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not 
handle XML encoding per spec (created 2021-02-26)
https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not 
acknowledging messages correctly (created 2021-02-17)
https://issues.apache.org/jira/browse/BEAM-11755: Cross-language 
consistency (RequiresStableInputs) is quietly broken (at least on portable 
flink runner) (created 2021-02-05)
https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` 
(python) fails with TypeError (when int overflowing?) (created 2021-01-06)
https://issues.apache.org/jira/browse/BEAM-11434: Expose Spanner 
admin/batch clients in Spanner Accessor (created 2020-12-10)
https://issues.apache.org/jira/browse/BEAM-11148: Kafka 
commitOffsetsInFinalize OOM on Flink (created 2020-10-28)
https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow 
runner can be set multiple times (dataflow runner) (created 2020-10-05)
https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable 
Splittable DoFn the only option when executing Java "Read" transforms (created 
2020-08-10)
https://issues.apache.org/jira/browse/BEAM-10617: python 
CombineGlobally().with_fanout() cause duplicate combine results for sliding 
windows (created 2020-07-31)
https://issues.apache.org/jira/browse/BEAM-10569: SpannerIO tests don't 
actually assert anything. (created 2020-07-23)
https://issues.apache.org/jira/browse/BEAM-10288: Quickstart documents are 
out of date (created 2020-06-19)
https://issues.apache.org/jira/browse/BEAM-10244: Populate requirements 
cache fails on poetry-based packages (created 2020-06-11)
https://issues.apache.org/jira/browse/BEAM-10100: FileIO writeDynamic with 
AvroIO.sink not writing all data (created 2020-05-27)
https://issues.apache.org/jira/browse/BEAM-9564: Remove insecure ssl 
options from MongoDBIO (created 2020-03-20)
https://issues.apache.org/jira/browse/BEAM-9455: Environment-sensitive 
provisioning for 

[no subject]

2021-06-03 Thread Raphael Sanamyan
Hi, community,

I would like to start work on this task  
beam-10396,
 I hope nobody minds?
Also, if anyone has any details or developments on this task, I would be glad 
if you could share them.

Thank you,
Raphael.




Re: RenameFields behaves differently in DirectRunner

2021-06-03 Thread Reuven Lax
Correct.

On Thu, Jun 3, 2021 at 9:51 AM Kenneth Knowles  wrote:

> I still don't quite grok the details of how this succeeds or fails in
> different situations. The invalid row succeeds in serialization because the
> coder is not sensitive to the way in which it is invalid?
>
> Kenn
>
> On Wed, Jun 2, 2021 at 2:54 PM Brian Hulette  wrote:
>
>> > One thing that's been on the back burner for a long time is making
>> CoderProperties into a CoderTester like Guava's EqualityTester.
>>
>> Reuven's point still applies here though. This issue is not due to a bug
>> in SchemaCoder, it's a problem with the Row we gave SchemaCoder to encode.
>> I'm assuming a CoderTester would require manually generating inputs right?
>> These input Rows represent an illegal state that we wouldn't test with.
>> (That being said I like the idea of a CoderTester in general)
>>
>> Brian
>>
>> On Wed, Jun 2, 2021 at 12:11 PM Kenneth Knowles  wrote:
>>
>>> Mutability checking might catch that.
>>>
>>> I meant to suggest not putting the check in the pipeline, but offering a
>>> testing discipline that will catch such issues. One thing that's been on
>>> the back burner for a long time is making CoderProperties into a
>>> CoderTester like Guava's EqualityTester. Then it can run through all the
>>> properties without a user setting up test suites. Downside is that the test
>>> failure signal gets aggregated.
>>>
>>> Kenn
>>>
>>> On Wed, Jun 2, 2021 at 12:09 PM Brian Hulette 
>>> wrote:
>>>
 Could the DirectRunner just do an equality check whenever it does an
 encode/decode? It sounds like it's already effectively performing
 a CoderProperties.coderDecodeEncodeEqual for every element, just omitting
 the equality check.

 On Wed, Jun 2, 2021 at 12:04 PM Reuven Lax  wrote:

> There is no bug in the Coder itself, so that wouldn't catch it. We
> could insert CoderProperties.coderDecodeEncodeEqual in a subsequent ParDo,
> but if the Direct runner already does an encode/decode before that ParDo,
> then that would have fixed the problem before we could see it.
>
> On Wed, Jun 2, 2021 at 11:53 AM Kenneth Knowles 
> wrote:
>
>> Would it be caught by CoderProperties?
>>
>> Kenn
>>
>> On Wed, Jun 2, 2021 at 8:16 AM Reuven Lax  wrote:
>>
>>> I don't think this bug is schema specific - we created a Java object
>>> that is inconsistent with its encoded form, which could happen to any
>>> transform.
>>>
>>> This does seem to be a gap in DirectRunner testing though. It also
>>> makes it hard to test using PAssert, as I believe that puts everything 
>>> in a
>>> side input, forcing an encoding/decoding.
>>>
>>> On Wed, Jun 2, 2021 at 8:12 AM Brian Hulette 
>>> wrote:
>>>
 +dev 

 > I bet the DirectRunner is encoding and decoding in between, which
 fixes the object.

 Do we need better testing of schema-aware (and potentially other
 built-in) transforms in the face of fusion to root out issues like 
 this?

 Brian

 On Wed, Jun 2, 2021 at 5:13 AM Matthew Ouyang <
 matthew.ouy...@gmail.com> wrote:

> I have some other work-related things I need to do this week, so I
> will likely report back on this over the weekend.  Thank you for the
> explanation.  It makes perfect sense now.
>
> On Tue, Jun 1, 2021 at 11:18 PM Reuven Lax 
> wrote:
>
>> Some more context - the problem is that RenameFields outputs (in
>> this case) Java Row objects that are inconsistent with the actual 
>> schema.
>> For example if you have the following schema:
>>
>> Row {
>>field1: Row {
>>   field2: string
>> }
>> }
>>
>> And rename field1.field2 -> renamed, you'll get the following
>> schema
>>
>> Row {
>>   field1: Row {
>>  renamed: string
>>}
>> }
>>
>> However the Java object for the _nested_ row will return the old
>> schema if getSchema() is called on it. This is because we only 
>> update the
>> schema on the top-level row.
>>
>> I think this explains why your test works in the direct runner.
>> If the row ever goes through an encode/decode path, it will come back
>> correct. The original incorrect Java objects are no longer around, 
>> and new
>> (consistent) objects are constructed from the raw data and the 
>> PCollection
>> schema. Dataflow tends to fuse ParDos together, so the following 
>> ParDo will
>> see the incorrect Row object. I bet the DirectRunner is encoding and
>> decoding in between, which fixes the object.
>>
>> You can validate this theory by forcing a 

Re: RenameFields behaves differently in DirectRunner

2021-06-03 Thread Kenneth Knowles
I still don't quite grok the details of how this succeeds or fails in
different situations. The invalid row succeeds in serialization because the
coder is not sensitive to the way in which it is invalid?

Kenn

On Wed, Jun 2, 2021 at 2:54 PM Brian Hulette  wrote:

> > One thing that's been on the back burner for a long time is making
> CoderProperties into a CoderTester like Guava's EqualityTester.
>
> Reuven's point still applies here though. This issue is not due to a bug
> in SchemaCoder, it's a problem with the Row we gave SchemaCoder to encode.
> I'm assuming a CoderTester would require manually generating inputs right?
> These input Rows represent an illegal state that we wouldn't test with.
> (That being said I like the idea of a CoderTester in general)
>
> Brian
>
> On Wed, Jun 2, 2021 at 12:11 PM Kenneth Knowles  wrote:
>
>> Mutability checking might catch that.
>>
>> I meant to suggest not putting the check in the pipeline, but offering a
>> testing discipline that will catch such issues. One thing that's been on
>> the back burner for a long time is making CoderProperties into a
>> CoderTester like Guava's EqualityTester. Then it can run through all the
>> properties without a user setting up test suites. Downside is that the test
>> failure signal gets aggregated.
>>
>> Kenn
>>
>> On Wed, Jun 2, 2021 at 12:09 PM Brian Hulette 
>> wrote:
>>
>>> Could the DirectRunner just do an equality check whenever it does an
>>> encode/decode? It sounds like it's already effectively performing
>>> a CoderProperties.coderDecodeEncodeEqual for every element, just omitting
>>> the equality check.
>>>
>>> On Wed, Jun 2, 2021 at 12:04 PM Reuven Lax  wrote:
>>>
 There is no bug in the Coder itself, so that wouldn't catch it. We
 could insert CoderProperties.coderDecodeEncodeEqual in a subsequent ParDo,
 but if the Direct runner already does an encode/decode before that ParDo,
 then that would have fixed the problem before we could see it.

 On Wed, Jun 2, 2021 at 11:53 AM Kenneth Knowles 
 wrote:

> Would it be caught by CoderProperties?
>
> Kenn
>
> On Wed, Jun 2, 2021 at 8:16 AM Reuven Lax  wrote:
>
>> I don't think this bug is schema specific - we created a Java object
>> that is inconsistent with its encoded form, which could happen to any
>> transform.
>>
>> This does seem to be a gap in DirectRunner testing though. It also
>> makes it hard to test using PAssert, as I believe that puts everything 
>> in a
>> side input, forcing an encoding/decoding.
>>
>> On Wed, Jun 2, 2021 at 8:12 AM Brian Hulette 
>> wrote:
>>
>>> +dev 
>>>
>>> > I bet the DirectRunner is encoding and decoding in between, which
>>> fixes the object.
>>>
>>> Do we need better testing of schema-aware (and potentially other
>>> built-in) transforms in the face of fusion to root out issues like this?
>>>
>>> Brian
>>>
>>> On Wed, Jun 2, 2021 at 5:13 AM Matthew Ouyang <
>>> matthew.ouy...@gmail.com> wrote:
>>>
 I have some other work-related things I need to do this week, so I
 will likely report back on this over the weekend.  Thank you for the
 explanation.  It makes perfect sense now.

 On Tue, Jun 1, 2021 at 11:18 PM Reuven Lax 
 wrote:

> Some more context - the problem is that RenameFields outputs (in
> this case) Java Row objects that are inconsistent with the actual 
> schema.
> For example if you have the following schema:
>
> Row {
>field1: Row {
>   field2: string
> }
> }
>
> And rename field1.field2 -> renamed, you'll get the following
> schema
>
> Row {
>   field1: Row {
>  renamed: string
>}
> }
>
> However the Java object for the _nested_ row will return the old
> schema if getSchema() is called on it. This is because we only update 
> the
> schema on the top-level row.
>
> I think this explains why your test works in the direct runner. If
> the row ever goes through an encode/decode path, it will come back 
> correct.
> The original incorrect Java objects are no longer around, and new
> (consistent) objects are constructed from the raw data and the 
> PCollection
> schema. Dataflow tends to fuse ParDos together, so the following 
> ParDo will
> see the incorrect Row object. I bet the DirectRunner is encoding and
> decoding in between, which fixes the object.
>
> You can validate this theory by forcing a shuffle after
> RenameFields using Reshufflle. It should fix the issue If it does, 
> let me
> know and I'll work on a fix to RenameFields.
>
> On Tue, Jun 1, 2021 at 

Re: Modifying serializable classes

2021-06-03 Thread Kenneth Knowles
In general, all changes to transforms and coders are expected to allow
users to update to the latest Beam version. Different runners serialize at
different points, so we typically just assume the encoding and transform
layout must remain the same.

I am pretty confident that in the case of ResourceIds they definitely are
serialized in places that would break users if the encoding changed.

Kenn

On Thu, Jun 3, 2021 at 8:26 AM Matt Rudary  wrote:

> My general question is what responsibility we have to maintain forward and
> backward compatibility for serialization of objects in the SDK. My specific
> question is about org.apache.beam.sdk.io.aws.s3.S3ResourceId – how can I
> tell whether ResourceIds are serialized anywhere that would require stable
> serialization across Beam SDK updates?
>
>
>
> Thanks
>


Beam Website Feedback

2021-06-03 Thread Nickolai Morozov
Hi! This is a feedback for
https://beam.apache.org/documentation/pipelines/test-your-pipeline/ page.

Thank you for that tutorial. It was pretty helpful for a simple test.
Unfortunately I didn't find a tutorial on how to test the streaming
pipeline and windowing in particular.


Nick Morozov, python developer.


Modifying serializable classes

2021-06-03 Thread Matt Rudary
My general question is what responsibility we have to maintain forward and 
backward compatibility for serialization of objects in the SDK. My specific 
question is about org.apache.beam.sdk.io.aws.s3.S3ResourceId - how can I tell 
whether ResourceIds are serialized anywhere that would require stable 
serialization across Beam SDK updates?

Thanks


Re: portable runner - spark streaming support

2021-06-03 Thread Alexey Romanenko
Hi,

AFAIK, Portable Spark runner supports Streaming mode [1] (with some exceptions, 
like Side inputs and States/Timers) and Beam provides a portable Kafka 
connector for Python SDK.
So, it should work.

—
Alexey 

[1] https://issues.apache.org/jira/browse/BEAM-7587

> On 2 Jun 2021, at 17:21, Moshe Hoadley  wrote:
> 
> Hi
> I would like to run beam written in python SDK on spark.
> I need to read from kafka, so, I need streaming functionality.
>  
> Does the spark portable runner support streaming? If not, is there a roadmap 
> for it?
>  
> Thanks
> Moshe
> This email and the information contained herein is proprietary and 
> confidential and subject to the Amdocs Email Terms of Service, which you may 
> review athttps://www.amdocs.com/about/email-terms-of-service 
> 


[VOTE] Release 2.30.0, release candidate #1

2021-06-03 Thread Heejong Lee
Hi everyone,

Please review and vote on the release candidate #1 for the version 2.30.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if no issues are found.

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
DBC03F1CCF4240FBD0F256F054550BE0F4C0A24D [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.30.0-RC1" [5],
* website pull request listing the release [6], publishing the API
reference manual [7], and the blog post [8].
* Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0_292.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.30.0 release to help with validation
[9].
* Docker images published to Docker Hub [10].
* Python artifacts are published to pypi as a pre-release version [11].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Heejong

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349978
[2] https://dist.apache.org/repos/dist/dev/beam/2.30.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1174/
[5] https://github.com/apache/beam/tree/v2.30.0-RC1
[6] https://github.com/apache/beam/pull/14894
[7] https://github.com/apache/beam-site/pull/613
[8] https://github.com/apache/beam/pull/14895
[9]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=109662250
[10] https://hub.docker.com/search?q=apache%2Fbeam=image
[11] https://pypi.org/project/apache-beam/2.30.0rc1/