Re: [VOTE] Release 2.52.0, release candidate #2

2023-11-09 Thread Yi Hu via dev
+1 (non-binding) Tested on Java IO load tests (
https://github.com/bvolpato/DataflowTemplates/tree/56d18a31c1c95e58543d7a1656bd83d7e859b482/it)
BigQueryIO, TextIO, BigtableIO, SpannerIO on Dataflow legacy runner and
runner v2

While it was announced there will be an RC3, the RC2 validation for IO
benchmark was still ongoing. I decided to continue with validation as
suggested. Will also run a few pipelines with RC3.

Regards,
Yi

On Wed, Nov 8, 2023 at 10:25 AM Danny McCormick via dev 
wrote:

> Hey everyone, @Ritesh Ghorse  pointed out to me
> that the docker containers were not pushed for RC2, just for RC1. On closer
> inspection, I've realized that I accidentally built the RC from the RC1 tag
> (https://github.com/apache/beam/tree/v2.52.0-RC1) instead of the RC2 tag (
> https://github.com/apache/beam/tree/v2.52.0-RC2), so it is also missing
> an important cherry pick fix to the Datastore IO (
> https://github.com/apache/beam/commit/0fdf404873636d24be50ae8360a08e4dddfae679
> ).
>
> I'm going to move to RC3 and should have that out later today. You're
> still welcome to do more validation on RC2, especially if you're not using
> the Datastore IO. Sorry for the mixup!
>
> Thanks,
> Danny
>
> On Wed, Nov 8, 2023 at 9:27 AM Svetak Sundhar via dev 
> wrote:
>
>> Thanks, Danny!
>>
>> @all: Reminder that if there's anything you think that is worth
>> documenting while RC testing, please feel free to add it here
>> 
>> .
>>
>> We can then use it to update
>> https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#vote-and-validate-the-release-candidate
>> .
>>
>> Thanks,
>>
>>
>> Svetak Sundhar
>>
>>   Data Engineer
>> s vetaksund...@google.com
>>
>>
>>
>> On Wed, Nov 8, 2023 at 9:04 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Regards
>>> JB
>>>
>>> On Wed, Nov 8, 2023 at 12:24 AM Danny McCormick via dev
>>>  wrote:
>>> >
>>> > Hi everyone,
>>> > Please review and vote on the release candidate #2 for the version
>>> 2.52.0, as follows:
>>> > [ ] +1, Approve the release
>>> > [ ] -1, Do not approve the release (please provide specific comments)
>>> >
>>> >
>>> > Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>>> count towards the final vote, but votes from all community members is
>>> encouraged and helpful for finding regressions; you can either test your
>>> own use cases or use cases from the validation sheet [10].
>>> >
>>> > The complete staging area is available for your review, which includes:
>>> >
>>> > GitHub Release notes [1]
>>> > the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint D20316F712213422 [3]
>>> > all artifacts to be deployed to the Maven Central Repository [4]
>>> > source code tag "v2.52.0-RC1" [5]
>>> > website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7]
>>> > Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> > Go artifacts and documentation are available at pkg.go.dev [9]
>>> > Validation sheet with a tab for 2.52.0 release to help with validation
>>> [10]
>>> > Docker images published to Docker Hub [11]
>>> > PR to run tests against release branch [12]
>>> >
>>> >
>>> > The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>> >
>>> > For guidelines on how to try the release in your projects, check out
>>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> >
>>> > Thanks,
>>> > Danny
>>> >
>>> > [1] https://github.com/apache/beam/milestone/16
>>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
>>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> > [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1360/
>>> > [5] https://github.com/apache/beam/tree/v2.52.0-RC2
>>> > [6] https://github.com/apache/beam/pull/29331
>>> > [7] https://github.com/apache/beam-site/pull/652
>>> > [8] https://pypi.org/project/apache-beam/2.52.0rc2/
>>> > [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC2/go/pkg/beam
>>> > [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
>>> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> > [12] https://github.com/apache/beam/pull/29319
>>>
>>


Re: [External Sender] Re: [Question] Error handling for IO Write Functions

2023-11-09 Thread Robert Bradshaw via dev
+1

Specifically, p.run().waitUntilFinish() would throw an exception if there
were errors during pipeline execution.

On Wed, Nov 8, 2023 at 8:05 AM John Casey via dev 
wrote:

> Yep, thats a common misunderstanding with beam.
>
> The code that is actually executed in the try block is just for pipeline
> construction, and no data is processed at this point in time.
>
> Once the pipeline is constructed, the various pardos are serialized, and
> sent to the runners, where they are actually executed.
>
> In this case, if there was an exception in the pardo that converts rows to
> avro, you would see the "Exception when converting Beam Row to Avro Record"
> log in whatever logs your runner provides you, and the exception would
> propagate up to your runner.
>
> In this case, your log log.info("Finished writing Parquet file to path
> {}", writePath); is inaccurate, it will log when the pipeline is
> constructed, not when the parquet write completes
>
> On Wed, Nov 8, 2023 at 10:51 AM Ramya Prasad via dev 
> wrote:
>
>> Hey John,
>>
>> Yes that's how my code is set up, I have the FileIO.write() in its own
>> try-catch block. I took a second look at where exactly the code is failing,
>> and it's actually in a ParDo function which is happening before I call
>> FileIO.write(). But even within that, I've tried adding a try-catch but the
>> error isn't stopping the actual application run in a Spark cluster. In the
>> cluster, I see that the exception is being thrown from my ParDo, but then
>> immediately after that, I see the line* INFO ApplicationMaster: Final
>> app status: SUCCEEDED, exitCode: 0. *This is roughly what my code setup
>> looks like:
>>
>> @Slf4j
>> public class ParquetWriteActionStrategy {
>>
>> public void executeWriteAction(Pipeline p) throws Exception {
>>
>> try {
>>
>> // transform PCollection from type Row to GenericRecords
>> PCollection records = p.apply("transform 
>> PCollection from type Row to GenericRecords",
>> ParDo.of(new DoFn() {
>> @ProcessElement
>> public void processElement(@Element Row row, 
>> OutputReceiver out) {
>> try {
>> 
>> } catch (Exception e) {
>> log.error("Exception when converting Beam 
>> Row to Avro Record: {}", e.getMessage());
>> throw e;
>> }
>>
>> }
>> })).setCoder(AvroCoder.of(avroSchema));
>> records.apply("Writing Parquet Output File", 
>> FileIO.
>> write()
>> .via()
>> .to(writePath)
>> .withSuffix(".parquet"));
>>
>> log.info("Finished writing Parquet file to path {}", writePath);
>> } catch (Exception e) {
>> log.error("Error in Parquet Write Action. {}", e.getMessage());
>> throw e;
>> }
>>
>> }
>>
>>
>> On Wed, Nov 8, 2023 at 9:16 AM John Casey via dev 
>> wrote:
>>
>>> There are 2 execution times when using Beam. The first execution is
>>> local, when a pipeline is constructed, and the second is remote on the
>>> runner, processing data.
>>>
>>> Based on what you said, it sounds like you are wrapping pipeline
>>> construction in a try-catch, and constructing FileIO isn't failing.
>>>
>>> e.g.
>>>
>>> try {
>>>
>>> FileIO.write().someOtherconfigs()
>>>
>>> } catch ...
>>>
>>> this will catch any exceptions in constructing fileio, but the running
>>> pipeline won't propagate exceptions through this exception block.
>>>
>>> On Tue, Nov 7, 2023 at 5:21 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
 File write failures should be throwing exceptions that will
 terminate the pipeline on failure. (Generally a distributed runner will
 make multiple attempts before abandoning the entire pipeline of course.)

 Are you seeing files failing to be written but no exceptions being
 thrown? If so, this is definitely a bug that we want to resolve.


 On Tue, Nov 7, 2023 at 11:17 AM Ramya Prasad via dev <
 dev@beam.apache.org> wrote:

> Hello,
>
> I am a developer using Apache Beam in my Java application, and I need
> some help on how to handle exceptions when writing a file to S3. I have
> tried wrapping my code within a try-catch block, but no exception is being
> thrown within the try block. I'm assuming that FileIO doesn't throw any
> exceptions upon failure. Is there a way in which I can either terminate 
> the
> program on failure or at least be made aware of if any of my write
> operations fail?
>
> Thanks and sincerely,
> Ramya
> --
>
> The information contained in this e-mail may be confidential and/or
> proprietary to Capital 

Re: [VOTE] Release 2.52.0, release candidate #3

2023-11-09 Thread Ritesh Ghorse via dev
+1 (non-binding)

Validated Python SDK quickstart batch and streaming.

Thanks!

On Thu, Nov 9, 2023 at 9:25 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Validated Java SDK with Flink runner on own use cases.
>  Jan
>
> On 11/9/23 03:31, Danny McCormick via dev wrote:
>
> Hi everyone,
> Please review and vote on the release candidate #3 for the version 2.52.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found. Only PMC member votes will
> count towards the final vote, but votes from all community members is
> encouraged and helpful for finding regressions; you can either test your
> own use cases or use cases from the validation sheet [10].
>
> The complete staging area is available for your review, which includes:
>
>- GitHub Release notes [1]
>- the official Apache source release to be deployed to dist.apache.org [2],
>which is signed with the key with fingerprint D20316F712213422 [3]
>- all artifacts to be deployed to the Maven Central Repository [4]
>- source code tag "v2.52.0-RC3" [5]
>- website pull request listing the release [6], the blog post [6], and
>publishing the API reference manual [7]
>- Python artifacts are deployed along with the source release to the
>dist.apache.org [2] and PyPI[8].
>- Go artifacts and documentation are available at pkg.go.dev [9]
>- Validation sheet with a tab for 2.52.0 release to help with
>validation [10]
>- Docker images published to Docker Hub [11]
>- PR to run tests against release branch [12]
>
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our
> blog post at https://beam.apache.org/blog/validate-beam-release/.
>
> Thanks,
> Danny
>
> [1] https://github.com/apache/beam/milestone/16
> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1361/
> [5] https://github.com/apache/beam/tree/v2.52.0-RC3
> [6] https://github.com/apache/beam/pull/29331
> [7] https://github.com/apache/beam-site/pull/653
> [8] https://pypi.org/project/apache-beam/2.52.0rc2/
> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC3/go/pkg/beam
> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> [12] https://github.com/apache/beam/pull/29319
>
>


Re: [VOTE] Release 2.52.0, release candidate #3

2023-11-09 Thread Jan Lukavský

+1 (binding)

Validated Java SDK with Flink runner on own use cases.

 Jan

On 11/9/23 03:31, Danny McCormick via dev wrote:

Hi everyone,
Please review and vote on the release candidate #3 for the version 
2.52.0, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release 
candidate, and vote +1 if no issues are found. Only PMC member votes 
will count towards the final vote, but votes from all community 
members is encouraged and helpful for finding regressions; you can 
either test your own use cases or use cases from the validation sheet 
[10].


The complete staging area is available for your review, which includes:

  * GitHub Release notes [1]
  * the official Apache source release to be deployed to
dist.apache.org  [2], which is signed
with the key with fingerprint D20316F712213422 [3]
  * all artifacts to be deployed to the Maven Central Repository [4]
  * source code tag "v2.52.0-RC3" [5]
  * website pull request listing the release [6], the blog post [6],
and publishing the API reference manual [7]
  * Python artifacts are deployed along with the source release to the
dist.apache.org  [2] and PyPI[8].
  * Go artifacts and documentation are available at pkg.go.dev
 [9]
  * Validation sheet with a tab for 2.52.0 release to help with
validation [10]
  * Docker images published to Docker Hub [11]
  * PR to run tests against release branch [12]


The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.


For guidelines on how to try the release in your projects, check out 
our blog post at https://beam.apache.org/blog/validate-beam-release/.


Thanks,
Danny

[1] https://github.com/apache/beam/milestone/16
[2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1361/
[5] https://github.com/apache/beam/tree/v2.52.0-RC3
[6] https://github.com/apache/beam/pull/29331
[7] https://github.com/apache/beam-site/pull/653
[8] https://pypi.org/project/apache-beam/2.52.0rc2/
[9] 
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC3/go/pkg/beam
[10] 
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
[11] https://hub.docker.com/search?q=apache%2Fbeam=image 


[12] https://github.com/apache/beam/pull/29319

Beam High Priority Issue Report (45)

2023-11-09 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/29076 [Failing Test]: Python ARM 
PostCommit failing after #28385
https://github.com/apache/beam/issues/29022 [Failing Test]: Python Github 
actions tests are failing due to update of pip 
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28715 [Bug]: Python WriteToBigtable get 
stuck for large jobs due to client dead lock
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121