Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]
dmitryor commented on PR #30672: URL: https://github.com/apache/beam/pull/30672#issuecomment-2006005444 R: @scwhittle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]
dmitryor opened a new pull request, #30672: URL: https://github.com/apache/beam/pull/30672 Existing code incorrectly assumes a per-entry HashMap overhead is 16 bytes. In reality it's 32 bytes [per this article](https://appsintheopen.com/posts/52-the-memory-overhead-of-java-ojects), confirmed by profiling: 1.22GB for 40.96M objects ~ 32 bytes/object ![313862428-9054f0e9-f0c0-4774-85b8-84fa3dd397b3](https://github.com/apache/beam/assets/34167644/0538a3a5-fb68-443e-b334-4d4342c195e4) ![313862425-15f2123b-a839-4339-aa7a-4710d04c9552](https://github.com/apache/beam/assets/34167644/2352394c-6f8e-4e3a-977a-884fd6ff2d2a) Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]
github-actions[bot] commented on PR #30672: URL: https://github.com/apache/beam/pull/30672#issuecomment-2006009059 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]
scwhittle commented on PR #30672: URL: https://github.com/apache/beam/pull/30672#issuecomment-2006346607 > Task :runners:google-cloud-dataflow-java:worker:test org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateCacheTest > testBasic FAILED java.lang.AssertionError at WindmillStateCacheTest.java:171 org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateCacheTest > testStaleWorkItem FAILED java.lang.AssertionError at WindmillStateCacheTest.java:257 org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateCacheTest > testInvalidation FAILED java.lang.AssertionError at WindmillStateCacheTest.java:215 some test failures and spotless needs to be fixed ./gradlew :runners:google-cloud-dataflow-java:worker:spotlessApply -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] add pythonpath to cloudcoverage tox environment [beam]
volatilemolotov opened a new pull request, #30673: URL: https://github.com/apache/beam/pull/30673 Adds PYTHONPATH environment variable to cloudcoverage tox environment set to toxinidir (sdks/python). This allows pytest to load all the modules and run the correct coverage report Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Failing Test]: PostCommit Java SingleStoreIO IT failing [beam]
AdalbertMemSQL commented on issue #30564: URL: https://github.com/apache/beam/issues/30564#issuecomment-2006636094 Hey @Abacn Is it possible to somehow retrieve full workload logs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Feature Request]: Refresh side input from BigQuery [beam]
BostjanBozic commented on issue #26196: URL: https://github.com/apache/beam/issues/26196#issuecomment-2006766022 Just a question - if you use global window (since Pub/Sub would be unbounded source), would `PeriodicImpulse` come into play or would you need to use `GenerateSequence` for that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add flag for direct path that reads from system properties [beam]
scwhittle commented on code in PR #30588: URL: https://github.com/apache/beam/pull/30588#discussion_r1530507529 ## runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowStreamingPipelineOptions.java: ## @@ -211,6 +211,14 @@ public interface DataflowStreamingPipelineOptions extends PipelineOptions { void setWindmillServiceStreamMaxBackoffMillis(int value); + @Description( + "If true, Dataflow streaming pipeline will be running in direct path mode." + + " VMs must have IPv6 enabled for this to work.") + @Default.Boolean(false) + boolean getIsWindmillServiceDirectPathEnabled(); + + void setIsWindmillServiceDirectPathEnabled(boolean isWindmillServiceDirectPathEnabled); Review Comment: I would remove the `is`, otherwise specifying the flag looks odd IMO `--isWindmillServiceDirectPathEnabled=true` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.4 to 1.17.8 in /sdks [beam]
lostluck commented on PR #30669: URL: https://github.com/apache/beam/pull/30669#issuecomment-2007677567 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.4 to 1.17.8 in /sdks [beam]
lostluck merged PR #30669: URL: https://github.com/apache/beam/pull/30669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add BigTableIO Stress test [beam]
Abacn commented on code in PR #30630: URL: https://github.com/apache/beam/pull/30630#discussion_r1530769586 ## it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/IOStressTestBase.java: ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.it.gcp; + +import com.google.cloud.Timestamp; +import java.io.IOException; +import java.io.Serializable; +import java.text.ParseException; +import java.time.Duration; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import java.util.UUID; +import org.apache.beam.it.common.PipelineLauncher; +import org.apache.beam.sdk.testutils.NamedTestResult; +import org.apache.beam.sdk.testutils.metrics.IOITMetrics; +import org.apache.beam.sdk.testutils.publishing.InfluxDBSettings; +import org.apache.beam.sdk.transforms.DoFn; +import org.joda.time.Instant; + +/** Base class for IO Stress tests. */ +public class IOStressTestBase extends IOLoadTestBase { + /** + * The load will initiate at 1x, progressively increase to 2x and 4x, then decrease to 2x and + * eventually return to 1x. + */ + protected static final int[] DEFAULT_LOAD_INCREASE_ARRAY = {1, 2, 2, 4, 2, 1}; + + protected static final int DEFAULT_ROWS_PER_SECOND = 1000; + + /** + * Generates and returns a list of LoadPeriod instances representing periods of load increase + * based on the specified load increase array and total duration in minutes. + * + * @param minutesTotal The total duration in minutes for which the load periods are generated. + * @return A list of LoadPeriod instances defining periods of load increase. + */ + protected List getLoadPeriods(int minutesTotal, int[] loadIncreaseArray) { + +List loadPeriods = new ArrayList<>(); +long periodDurationMillis = +Duration.ofMinutes(minutesTotal / loadIncreaseArray.length).toMillis(); +long startTimeMillis = 0; + +for (int loadIncreaseMultiplier : loadIncreaseArray) { + long endTimeMillis = startTimeMillis + periodDurationMillis; + loadPeriods.add(new LoadPeriod(loadIncreaseMultiplier, startTimeMillis, endTimeMillis)); + + startTimeMillis = endTimeMillis; +} +return loadPeriods; + } + + /** + * Represents a period of time with associated load increase properties for stress testing + * scenarios. + */ + protected static class LoadPeriod implements Serializable { +private final int loadIncreaseMultiplier; +private final long periodStartMillis; +private final long periodEndMillis; + +public LoadPeriod(int loadIncreaseMultiplier, long periodStartMillis, long periodEndMin) { + this.loadIncreaseMultiplier = loadIncreaseMultiplier; + this.periodStartMillis = periodStartMillis; + this.periodEndMillis = periodEndMin; +} + +public int getLoadIncreaseMultiplier() { + return loadIncreaseMultiplier; +} + +public long getPeriodStartMillis() { + return periodStartMillis; +} + +public long getPeriodEndMillis() { + return periodEndMillis; +} + } + + /** + * Custom Apache Beam DoFn designed for use in stress testing scenarios. It introduces a dynamic + * load increase over time, multiplying the input elements based on the elapsed time since the + * start of processing. This class aims to simulate various load levels during stress testing. + */ + protected static class MultiplierDoFn extends DoFn { +private final int startMultiplier; +private final long startTimesMillis; +private final List loadPeriods; + +public MultiplierDoFn(int startMultiplier, List loadPeriods) { + this.startMultiplier = startMultiplier; + this.startTimesMillis = Instant.now().getMillis(); + this.loadPeriods = loadPeriods; +} + +@DoFn.ProcessElement +public void processElement( +@Element T element, OutputReceiver outputReceiver, @DoFn.Timestamp Instant timestamp) { + + int multiplier = this.startMultiplier; + long elapsedTimeMillis = timestamp.getMillis() - startTimesMillis; + + for (LoadPeriod loadPeriod : loadPeriods) { +if (elapsedTimeMillis >= loadPeriod.getPeriodStartMillis() +
Re: [PR] Add config validation to kafka read schema transform [beam]
ahmedabu98 commented on code in PR #30625: URL: https://github.com/apache/beam/pull/30625#discussion_r1530772951 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaReadSchemaTransformConfiguration.java: ## @@ -59,20 +63,25 @@ public void validate() { final String confluentSchemaRegSubject = this.getConfluentSchemaRegistrySubject(); if (confluentSchemaRegUrl != null) { - assert confluentSchemaRegSubject != null - : "To read from Kafka, a schema must be provided directly or though Confluent " - + "Schema Registry. Make sure you are providing one of these parameters."; + checkArgument( + confluentSchemaRegSubject != null, + "To read from Kafka, a schema must be provided directly or though Confluent " + + "Schema Registry. Make sure you are providing one of these parameters."); } else if (dataFormat != null && dataFormat.equals("RAW")) { - assert inputSchema == null : "To read from Kafka in RAW format, you can't provide a schema."; + checkArgument( + inputSchema == null, "To read from Kafka in RAW format, you can't provide a schema."); } else if (dataFormat != null && dataFormat.equals("JSON")) { - assert inputSchema != null : "To read from Kafka in JSON format, you must provide a schema."; + checkArgument( + inputSchema != null, "To read from Kafka in JSON format, you must provide a schema."); } else if (dataFormat != null && dataFormat.equals("PROTO")) { - assert messageName != null - : "To read from Kafka in PROTO format, messageName must be provided."; - assert fileDescriptorPath != null || inputSchema != null - : "To read from Kafka in PROTO format, fileDescriptorPath or schema must be provided."; + checkArgument( + messageName != null, "To read from Kafka in PROTO format, messageName must be provided."); + checkArgument( + fileDescriptorPath != null || inputSchema != null, + "To read from Kafka in PROTO format, fileDescriptorPath or schema must be provided."); } else { - assert inputSchema != null : "To read from Kafka in AVRO format, you must provide a schema."; + checkArgument( + inputSchema != null, "To read from Kafka in AVRO format, you must provide a schema."); Review Comment: nit: for many of these, checkNotNull() would be more appropriate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Performance Regression or Improvement: cogbk_python_batch_load_test_2GB_of_100B_records_with_a_multiple_key:runtime [beam]
liferoad closed issue #30659: Performance Regression or Improvement: cogbk_python_batch_load_test_2GB_of_100B_records_with_a_multiple_key:runtime URL: https://github.com/apache/beam/issues/30659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_fanout_4_times_with_2GB_10byte_records_total:runtime [beam]
liferoad closed issue #30658: Performance Regression or Improvement: gbk_python_batch_load_test_fanout_4_times_with_2GB_10byte_records_total:runtime URL: https://github.com/apache/beam/issues/30658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
Abacn commented on PR #30655: URL: https://github.com/apache/beam/pull/30655#issuecomment-2007343641 Can we change the cron string to sth like '50 5 * * *' so the risk of interfering with other workflow is minimum https://github.com/apache/beam/blob/50f33cd786dc63463688315a1c73b1cf4ef18807/.github/workflows/beam_LoadTests_Python_Combine_Dataflow_Streaming.yml#L20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add config validation to kafka read schema transform [beam]
Polber commented on code in PR #30625: URL: https://github.com/apache/beam/pull/30625#discussion_r1530776385 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaReadSchemaTransformConfiguration.java: ## @@ -59,20 +63,25 @@ public void validate() { final String confluentSchemaRegSubject = this.getConfluentSchemaRegistrySubject(); if (confluentSchemaRegUrl != null) { - assert confluentSchemaRegSubject != null - : "To read from Kafka, a schema must be provided directly or though Confluent " - + "Schema Registry. Make sure you are providing one of these parameters."; + checkArgument( + confluentSchemaRegSubject != null, + "To read from Kafka, a schema must be provided directly or though Confluent " + + "Schema Registry. Make sure you are providing one of these parameters."); } else if (dataFormat != null && dataFormat.equals("RAW")) { - assert inputSchema == null : "To read from Kafka in RAW format, you can't provide a schema."; + checkArgument( + inputSchema == null, "To read from Kafka in RAW format, you can't provide a schema."); } else if (dataFormat != null && dataFormat.equals("JSON")) { - assert inputSchema != null : "To read from Kafka in JSON format, you must provide a schema."; + checkArgument( + inputSchema != null, "To read from Kafka in JSON format, you must provide a schema."); } else if (dataFormat != null && dataFormat.equals("PROTO")) { - assert messageName != null - : "To read from Kafka in PROTO format, messageName must be provided."; - assert fileDescriptorPath != null || inputSchema != null - : "To read from Kafka in PROTO format, fileDescriptorPath or schema must be provided."; + checkArgument( + messageName != null, "To read from Kafka in PROTO format, messageName must be provided."); + checkArgument( + fileDescriptorPath != null || inputSchema != null, + "To read from Kafka in PROTO format, fileDescriptorPath or schema must be provided."); } else { - assert inputSchema != null : "To read from Kafka in AVRO format, you must provide a schema."; + checkArgument( + inputSchema != null, "To read from Kafka in AVRO format, you must provide a schema."); Review Comment: Good point. Let me refactor that and push before emrging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]
Abacn opened a new pull request, #30674: URL: https://github.com/apache/beam/pull/30674 This reverts commit a3ea9ef706cf798fc1f6b026dcdf7171434e74d8. **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]
Abacn commented on PR #30674: URL: https://github.com/apache/beam/pull/30674#issuecomment-2007649109 Cache removed : https://issues.apache.org/jira/projects/INFRA/issues/INFRA-25595?filter=allopenissues -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add a way for channels to be closed manually [beam]
scwhittle merged PR #30425: URL: https://github.com/apache/beam/pull/30425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add config validation to kafka read schema transform [beam]
github-actions[bot] commented on PR #30625: URL: https://github.com/apache/beam/pull/30625#issuecomment-2007582202 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @damondouglas for label java. R: @ahmedabu98 for label io. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/google/uuid from 1.5.0 to 1.6.0 in /sdks [beam]
dependabot[bot] commented on PR #30254: URL: https://github.com/apache/beam/pull/30254#issuecomment-2007685637 Looks like github.com/google/uuid is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/google/uuid from 1.5.0 to 1.6.0 in /sdks [beam]
dependabot[bot] closed pull request #30254: Bump github.com/google/uuid from 1.5.0 to 1.6.0 in /sdks URL: https://github.com/apache/beam/pull/30254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks [beam]
lostluck commented on PR #30557: URL: https://github.com/apache/beam/pull/30557#issuecomment-2007686397 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Python] Check feature store existence at construction time [beam]
damccorm commented on PR #30668: URL: https://github.com/apache/beam/pull/30668#issuecomment-2007237814 3.10 failure looks like an unrelated GameStatsIT failure. So this should be safe to move forward -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
Abacn merged PR #30655: URL: https://github.com/apache/beam/pull/30655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: beam_LoadTests_Python_Combine_Dataflow_Streaming failing [beam]
Abacn closed issue #23904: [Bug]: beam_LoadTests_Python_Combine_Dataflow_Streaming failing URL: https://github.com/apache/beam/issues/23904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
Abacn commented on PR #30655: URL: https://github.com/apache/beam/pull/30655#issuecomment-2007640750 merge now and let's see how it goes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] update confluent dependency version to 7.6.0 [beam]
Abacn merged PR #30638: URL: https://github.com/apache/beam/pull/30638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Task]: Update confluent libraries versions from 5.3.2 to more recent in kafka io extension [beam]
Abacn closed issue #30610: [Task]: Update confluent libraries versions from 5.3.2 to more recent in kafka io extension URL: https://github.com/apache/beam/issues/30610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]
Abacn merged PR #30662: URL: https://github.com/apache/beam/pull/30662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Use autovalue's @Memoized in ExponentialBuckets [beam]
JayajP opened a new pull request, #30676: URL: https://github.com/apache/beam/pull/30676 Based on [autovalue's documentation](https://github.com/google/auto/blob/main/value/userguide/howto.md#-memoize-cache-derived-properties). For `ExponentialBuckets`, the `Base`, `InvLog2GrowthFactor`, and `RangeTo` are all derived from the input arguments `NumBuckets` and `Scale`. Currently these three values are computed for every instance of ExponentialHistogram, and they are used in the `hashCode` and `equalsTo` method for this class. With this change, the derived values will only be computed if an instance needs them, but derived values will only be computed once. Additionally, the derived values will not be used in the `equalsTo` or `hashCode` method making these methods more efficient. This PR also memoizes the `hashCode` of `ExponentialBuckets`, which is helpful because we compute this every time we record a histogram value. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Seeking guidance on new runner implementation [beam]
moradology opened a new issue, #30675: URL: https://github.com/apache/beam/issues/30675 OK, so I'm very interested in making the batch subset of apache beam play nicely with EMR-Serverless. Unfortunately, this is difficult to pull off with the portable runner - perhaps impossible even - as there is an assumption so far as I can tell that the spark master UI be available to take work from the beam's job runner. To that end, I've begun adapting roughly the strategy found in the dask runner in the python SDK to build up pyspark RDDs that are submitted directly via whatever `SparkSession` pyspark finds at runtime. So far, so good. I even have a (partial) implementation of support for side inputs! Unfortunately, here, I am running into some difficulties and would love to get some feedback on whatever it is that I might be missing. As runner authors will surely be aware, it is necessary to distinguish between `AsIter` and `AsSingleton` `AsSideInput` instances. Fair enough, but by the time I am traversing `AppliedPTransform` instances to evaluate, that information appears to be gone. Perhaps lost in some of the serialization/deserialization that occurs during `Transform` application! Here's what I'm seeing when I print out some context about a given `AppliedPTransform` [at this point in the runner](https://github.com/moradology/beam-pyspark-runner/blob/real_traversal_infrastructure/beam_pyspark_runner/pyspark_runner.py#L125) (so far, I've only run some visitors over the AST to collect some context that I use later in planning out execution): ``` 'write test/Write/WriteImpl/WriteBundles': {'input_producer_labels': ['write ' 'test/Write/WriteImpl/WindowInto(WindowIntoFn)'], 'input_producers': [AppliedPTransform(write test/Write/WriteImpl/WindowInto(WindowIntoFn), WindowInto)], 'inputs': (,), 'outputs': dict_values([]), 'parent': 'write ' 'test/Write/WriteImpl', 'side_inputs': (,), 'type': 'ParDo', 'xform_side_inputs': []}} ``` Note that I have an `_UnpickledSideInput`. This type does not include the `AsIter` and `AsSingleton` context that appears to be absolutely necessary to decide how results of a side-input should be passed to a consumer (whether the whole list or else just its head). What am I missing here? If I drop a debugger in beam's source for `core.ParDo`, I can see this information. It just appears to be lost later on. Example: `` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Performance Regression or Improvement: cogbk_python_batch_load_test_2GB_of_100B_records_with_a_single_key:runtime [beam]
liferoad closed issue #30652: Performance Regression or Improvement: cogbk_python_batch_load_test_2GB_of_100B_records_with_a_single_key:runtime URL: https://github.com/apache/beam/issues/30652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_2gb_of_100B_records:runtime [beam]
liferoad closed issue #30651: Performance Regression or Improvement: gbk_python_batch_load_test_2gb_of_100B_records:runtime URL: https://github.com/apache/beam/issues/30651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_2gb_of_10B_records:runtime [beam]
liferoad closed issue #30650: Performance Regression or Improvement: gbk_python_batch_load_test_2gb_of_10B_records:runtime URL: https://github.com/apache/beam/issues/30650 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Performance Regression or Improvement: combine_python_batch_2gb_10_byte_records:runtime [beam]
liferoad closed issue #30649: Performance Regression or Improvement: combine_python_batch_2gb_10_byte_records:runtime URL: https://github.com/apache/beam/issues/30649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_fanout_8_times_with_2GB_10byte_records_total:runtime [beam]
liferoad closed issue #30640: Performance Regression or Improvement: gbk_python_batch_load_test_fanout_8_times_with_2GB_10byte_records_total:runtime URL: https://github.com/apache/beam/issues/30640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Task]: Check if feature store exists in VertexAIFeatureStoreEnrichmentHandler [beam]
riteshghorse closed issue #30541: [Task]: Check if feature store exists in VertexAIFeatureStoreEnrichmentHandler URL: https://github.com/apache/beam/issues/30541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Task]: Check if feature store exists in VertexAIFeatureStoreEnrichmentHandler [beam]
riteshghorse commented on issue #30541: URL: https://github.com/apache/beam/issues/30541#issuecomment-2007266896 Fixed by #30668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update Python Dependencies from BRANCH weekly_update_python_dependencies_1710635460 [beam]
Abacn merged PR #30656: URL: https://github.com/apache/beam/pull/30656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/google/uuid from 1.5.0 to 1.6.0 in /sdks [beam]
lostluck commented on PR #30254: URL: https://github.com/apache/beam/pull/30254#issuecomment-2007684387 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Python] Check feature store existence at construction time [beam]
damccorm commented on PR #30668: URL: https://github.com/apache/beam/pull/30668#issuecomment-2007222555 Oh, looks like postcommits are failing though, could you take a look please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Python] Check feature store existence at construction time [beam]
damccorm commented on PR #30668: URL: https://github.com/apache/beam/pull/30668#issuecomment-2007225474 3.11 failure looks like an unrelated pubsub test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Python] Check feature store existence at construction time [beam]
riteshghorse merged PR #30668: URL: https://github.com/apache/beam/pull/30668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Python] Check feature store existence at construction time [beam]
riteshghorse commented on PR #30668: URL: https://github.com/apache/beam/pull/30668#issuecomment-2007260795 Yeah look like it, It is passing on master - https://github.com/apache/beam/actions/runs/8338077957/job/22817805661. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]
scwhittle commented on code in PR #30439: URL: https://github.com/apache/beam/pull/30439#discussion_r1530536071 ## runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java: ## @@ -59,8 +72,8 @@ public BoundedQueueExecutor( @Override protected void beforeExecute(Thread t, Runnable r) { super.beforeExecute(t, r); -synchronized (this) { - if (activeCount.getAndIncrement() >= maximumPoolSize - 1) { +synchronized (BoundedQueueExecutor.this) { + if (activeCount++ >= maximumThreadCount - 1 && startTimeMaxActiveThreadsUsed == 0) { Review Comment: nit: how about ++activeCount >= maximumThreadCount seems easier to read ## runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java: ## @@ -69,8 +82,8 @@ protected void beforeExecute(Thread t, Runnable r) { @Override protected void afterExecute(Runnable r, Throwable t) { super.afterExecute(r, t); -synchronized (this) { - if (activeCount.getAndDecrement() == maximumPoolSize) { +synchronized (BoundedQueueExecutor.this) { + if (activeCount-- <= maximumThreadCount && startTimeMaxActiveThreadsUsed > 0) { Review Comment: ditto, --activeCount < maximumThreadCount seems simpler ## runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java: ## @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.util; + +import static org.hamcrest.Matchers.greaterThan; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertTrue; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */ +@RunWith(JUnit4.class) +// TODO(https://github.com/apache/beam/issues/21230): Remove when new version of errorprone is +// released (2.11.0) +@SuppressWarnings("unused") +public class BoundedQueueExecutorTest { + @Rule public transient Timeout globalTimeout = Timeout.seconds(300); + private static final long MAXIMUM_BYTES_OUTSTANDING = 1000; + private static final int DEFAULT_MAX_THREADS = 2; + private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60; + + private BoundedQueueExecutor executor; + + private Runnable createSleepProcessWorkFn(CountDownLatch start, CountDownLatch stop) { +Runnable runnable = +() -> { + start.countDown(); + try { +stop.await(); + } catch (Exception e) { +throw new RuntimeException(e); + } +}; +return runnable; + } + + @Before + public void setUp() { +this.executor = +new BoundedQueueExecutor( +DEFAULT_MAX_THREADS, +DEFAULT_THREAD_EXPIRATION_SEC, +TimeUnit.SECONDS, +DEFAULT_MAX_THREADS + 100, +MAXIMUM_BYTES_OUTSTANDING, +new ThreadFactoryBuilder() +.setNameFormat("DataflowWorkUnits-%d") +.setDaemon(true) +.build()); + } + + @Test + public void testScheduleWork() throws Exception { +CountDownLatch processStart1 = new CountDownLatch(1); +CountDownLatch processStop1 = new CountDownLatch(1); +CountDownLatch processStart2 = new CountDownLatch(1); +CountDownLatch processStop2 = new CountDownLatch(1); +CountDownLatch processStart3 = new CountDownLatch(1); +CountDownLatch processStop3 = new CountDownLatch(1); +Runnable m1 =
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
liferoad commented on PR #30655: URL: https://github.com/apache/beam/pull/30655#issuecomment-2007621658 > Can we change the cron string to sth like '50 5 * * *' so the risk of interfering with other workflow is minimum > > https://github.com/apache/beam/blob/50f33cd786dc63463688315a1c73b1cf4ef18807/.github/workflows/beam_LoadTests_Python_Combine_Dataflow_Streaming.yml#L20 Good idea. Done. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Python] Upgrade mypy version to 1.7.1 [beam]
riteshghorse closed pull request #29707: [Python] Upgrade mypy version to 1.7.1 URL: https://github.com/apache/beam/pull/29707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]
github-actions[bot] commented on PR #30674: URL: https://github.com/apache/beam/pull/30674#issuecomment-2007765764 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Minimize scope of expensive lock [beam]
damccorm opened a new pull request, #30679: URL: https://github.com/apache/beam/pull/30679 We've seen this section hang, eating up the lock when shutdown takes a long time. This should minimize the size of the critical section to a single copy of `self.last_access_times.items()` so that it still avoids #27501 (which the lock was introduced for) without hanging too long. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Deduplicate common environments. [beam]
github-actions[bot] commented on PR #30681: URL: https://github.com/apache/beam/pull/30681#issuecomment-2008172678 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Deduplicate common environments. [beam]
robertwb commented on PR #30681: URL: https://github.com/apache/beam/pull/30681#issuecomment-2008171114 R: @chamikaramj -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]
MelodyShen commented on code in PR #30439: URL: https://github.com/apache/beam/pull/30439#discussion_r1531166753 ## runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java: ## @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.util; + +import static org.hamcrest.Matchers.greaterThan; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertTrue; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */ +@RunWith(JUnit4.class) +// TODO(https://github.com/apache/beam/issues/21230): Remove when new version of errorprone is +// released (2.11.0) +@SuppressWarnings("unused") +public class BoundedQueueExecutorTest { + @Rule public transient Timeout globalTimeout = Timeout.seconds(300); + private static final long MAXIMUM_BYTES_OUTSTANDING = 1000; + private static final int DEFAULT_MAX_THREADS = 2; + private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60; + + private BoundedQueueExecutor executor; + + private Runnable createSleepProcessWorkFn(CountDownLatch start, CountDownLatch stop) { +Runnable runnable = +() -> { + start.countDown(); + try { +stop.await(); + } catch (Exception e) { +throw new RuntimeException(e); + } +}; +return runnable; + } + + @Before + public void setUp() { +this.executor = +new BoundedQueueExecutor( +DEFAULT_MAX_THREADS, +DEFAULT_THREAD_EXPIRATION_SEC, +TimeUnit.SECONDS, +DEFAULT_MAX_THREADS + 100, +MAXIMUM_BYTES_OUTSTANDING, +new ThreadFactoryBuilder() +.setNameFormat("DataflowWorkUnits-%d") +.setDaemon(true) +.build()); + } + + @Test + public void testScheduleWork() throws Exception { +CountDownLatch processStart1 = new CountDownLatch(1); +CountDownLatch processStop1 = new CountDownLatch(1); +CountDownLatch processStart2 = new CountDownLatch(1); +CountDownLatch processStop2 = new CountDownLatch(1); +CountDownLatch processStart3 = new CountDownLatch(1); +CountDownLatch processStop3 = new CountDownLatch(1); +Runnable m1 = createSleepProcessWorkFn(processStart1, processStop1); +Runnable m2 = createSleepProcessWorkFn(processStart2, processStop2); +Runnable m3 = createSleepProcessWorkFn(processStart3, processStop3); + +executor.execute(m1, 1); +assertTrue(processStart1.await(1000, TimeUnit.MILLISECONDS)); +executor.execute(m2, 1); +assertTrue(processStart2.await(1000, TimeUnit.MILLISECONDS)); +// m1 and m2 have started and all threads are occupied so m3 will be queued and not executed. +executor.execute(m3, 1); +assertFalse(processStart3.await(1000, TimeUnit.MILLISECONDS)); + +// Stop m1 so there is an available thread for m3 to run. +processStop1.countDown(); +assertTrue(processStart3.await(1000, TimeUnit.MILLISECONDS)); +// m3 started. +processStop2.countDown(); +processStop3.countDown(); +executor.shutdown(); + } + + @Test + public void testOverrideMaximumThreadCount() throws Exception { +CountDownLatch processStart1 = new CountDownLatch(1); +CountDownLatch processStart2 = new CountDownLatch(1); +CountDownLatch processStart3 = new CountDownLatch(1); +CountDownLatch stop = new CountDownLatch(1); +Runnable m1 = createSleepProcessWorkFn(processStart1, stop); +Runnable m2 = createSleepProcessWorkFn(processStart2, stop); +Runnable m3 = createSleepProcessWorkFn(processStart3, stop); + +// Initial
Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]
kennknowles commented on PR #30545: URL: https://github.com/apache/beam/pull/30545#issuecomment-2007801662 The problem in the portable runner was that it was not registered as a "native" transform. I don't know the details, but this influences the fuser. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The PostCommit Python ValidatesRunner Spark job is flaky [beam]
Abacn commented on issue #30645: URL: https://github.com/apache/beam/issues/30645#issuecomment-2007928104 This is due to https://github.com/apache/beam/pull/30587#issuecomment-2004812901 CC: @robertwb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] replace clock.milliseconds with stopwatch [beam]
github-actions[bot] commented on PR #30678: URL: https://github.com/apache/beam/pull/30678#issuecomment-2007928098 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The PostCommit XVR Samza job is flaky [beam]
Abacn closed issue #30601: The PostCommit XVR Samza job is flaky URL: https://github.com/apache/beam/issues/30601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]
MelodyShen commented on code in PR #30439: URL: https://github.com/apache/beam/pull/30439#discussion_r1531096133 ## runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java: ## @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.util; + +import static org.hamcrest.Matchers.greaterThan; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertTrue; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */ +@RunWith(JUnit4.class) +// TODO(https://github.com/apache/beam/issues/21230): Remove when new version of errorprone is +// released (2.11.0) +@SuppressWarnings("unused") +public class BoundedQueueExecutorTest { + @Rule public transient Timeout globalTimeout = Timeout.seconds(300); + private static final long MAXIMUM_BYTES_OUTSTANDING = 1000; + private static final int DEFAULT_MAX_THREADS = 2; + private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60; + + private BoundedQueueExecutor executor; + + private Runnable createSleepProcessWorkFn(CountDownLatch start, CountDownLatch stop) { +Runnable runnable = +() -> { + start.countDown(); + try { +stop.await(); + } catch (Exception e) { +throw new RuntimeException(e); + } +}; +return runnable; + } + + @Before + public void setUp() { +this.executor = +new BoundedQueueExecutor( +DEFAULT_MAX_THREADS, +DEFAULT_THREAD_EXPIRATION_SEC, +TimeUnit.SECONDS, +DEFAULT_MAX_THREADS + 100, +MAXIMUM_BYTES_OUTSTANDING, +new ThreadFactoryBuilder() +.setNameFormat("DataflowWorkUnits-%d") +.setDaemon(true) +.build()); + } + + @Test + public void testScheduleWork() throws Exception { +CountDownLatch processStart1 = new CountDownLatch(1); +CountDownLatch processStop1 = new CountDownLatch(1); +CountDownLatch processStart2 = new CountDownLatch(1); +CountDownLatch processStop2 = new CountDownLatch(1); +CountDownLatch processStart3 = new CountDownLatch(1); +CountDownLatch processStop3 = new CountDownLatch(1); +Runnable m1 = createSleepProcessWorkFn(processStart1, processStop1); +Runnable m2 = createSleepProcessWorkFn(processStart2, processStop2); +Runnable m3 = createSleepProcessWorkFn(processStart3, processStop3); + +executor.execute(m1, 1); +assertTrue(processStart1.await(1000, TimeUnit.MILLISECONDS)); +executor.execute(m2, 1); +assertTrue(processStart2.await(1000, TimeUnit.MILLISECONDS)); +// m1 and m2 have started and all threads are occupied so m3 will be queued and not executed. +executor.execute(m3, 1); +assertFalse(processStart3.await(1000, TimeUnit.MILLISECONDS)); + +// Stop m1 so there is an available thread for m3 to run. +processStop1.countDown(); +assertTrue(processStart3.await(1000, TimeUnit.MILLISECONDS)); +// m3 started. +processStop2.countDown(); +processStop3.countDown(); +executor.shutdown(); + } + + @Test + public void testOverrideMaximumThreadCount() throws Exception { +CountDownLatch processStart1 = new CountDownLatch(1); +CountDownLatch processStart2 = new CountDownLatch(1); +CountDownLatch processStart3 = new CountDownLatch(1); +CountDownLatch stop = new CountDownLatch(1); +Runnable m1 = createSleepProcessWorkFn(processStart1, stop); +Runnable m2 = createSleepProcessWorkFn(processStart2, stop); +Runnable m3 = createSleepProcessWorkFn(processStart3, stop); + +// Initial
[PR] Deduplicate common environments. [beam]
robertwb opened a new pull request, #30681: URL: https://github.com/apache/beam/pull/30681 This can be especially useful for those pipelines with many cross-language transforms. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use autovalue's @Memoized in ExponentialBuckets [beam]
JayajP commented on PR #30676: URL: https://github.com/apache/beam/pull/30676#issuecomment-2007940060 R: @m-trieu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use autovalue's @Memoized in ExponentialBuckets [beam]
github-actions[bot] commented on PR #30676: URL: https://github.com/apache/beam/pull/30676#issuecomment-2007942325 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add ExternalTransformProvider example [beam]
liferoad commented on code in PR #30666: URL: https://github.com/apache/beam/pull/30666#discussion_r1530967292 ## examples/multi-language/README.md: ## @@ -28,6 +28,8 @@ This project provides examples of Apache Beam * **python/javacount** - A Python pipeline that counts words using the Java `Count.perElement()` transform. * **python/javadatagenerator** - A Python pipeline that produces a set of strings generated from Java. This example demonstrates the `JavaExternalTransform` API. +* **python/wordcount_external** - A Python pipeline that runs the Word Count workflow using three external Java +transforms. This example demonstrates the simpler `ExternalTransformProvider` API. Review Comment: Do we have any step-by-step guide somewhere about how to create a new JavaExternalTransform? If so, can we link it here? When I first looked at the java codes, I am a bit lost about what parts I need to create in order to use `ExternalTransformProvider`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Disable unsupported custom window type test on samza and spark. [beam]
robertwb opened a new pull request, #30680: URL: https://github.com/apache/beam/pull/30680 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Deduplicate common environments. [beam]
robertwb commented on code in PR #30681: URL: https://github.com/apache/beam/pull/30681#discussion_r1531179623 ## sdks/python/apache_beam/runners/common.py: ## @@ -1941,3 +1945,64 @@ def validate_transform(transform_id): for t in pipeline_proto.root_transform_ids: validate_transform(t) + + +def merge_common_environments(pipeline_proto): Review Comment: Ah, yes, it looks like it does. (That code didn't seem to be working, as I was definitely seeing environments that needed deduplication, but perhaps I should merge the two.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use autovalue's @Memoized in ExponentialBuckets [beam]
github-actions[bot] commented on PR #30676: URL: https://github.com/apache/beam/pull/30676#issuecomment-2007873357 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]
Abacn commented on PR #30674: URL: https://github.com/apache/beam/pull/30674#issuecomment-2007874461 R: @damccorm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]
kennknowles commented on PR #30545: URL: https://github.com/apache/beam/pull/30545#issuecomment-2008089051 OK this is now green. The test suites that are still running were green before and have not been touched. Please review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]
github-actions[bot] commented on PR #30674: URL: https://github.com/apache/beam/pull/30674#issuecomment-2007876434 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Minimize scope of expensive lock [beam]
damccorm commented on PR #30679: URL: https://github.com/apache/beam/pull/30679#issuecomment-2008095596 R: @tvalentyn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]
kennknowles commented on code in PR #30545: URL: https://github.com/apache/beam/pull/30545#discussion_r1531106246 ## runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java: ## @@ -917,6 +919,41 @@ private void groupByKeyAndSortValuesHelper( } }); +registerTransformTranslator( +RedistributeByKey.class, Review Comment: @robertwb you might be the person to review DataflowRunner translation? For Reshuffle we don't have a translator but a more complex rewrites to a specialized GroupByKey. I opted to _not_ do that this time but translate more directly. I added ValidatesRunner tests for Redistribute that check parity with Reshuffle, at least in terms of that test suite. CC @scwhittle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Deduplicate common environments. [beam]
chamikaramj commented on code in PR #30681: URL: https://github.com/apache/beam/pull/30681#discussion_r1531155798 ## sdks/python/apache_beam/runners/common.py: ## @@ -1941,3 +1945,64 @@ def validate_transform(transform_id): for t in pipeline_proto.root_transform_ids: validate_transform(t) + + +def merge_common_environments(pipeline_proto): Review Comment: Does this make the merge logic at the following location obsolete ? https://github.com/apache/beam/blob/fb7ba65e2236f3dd871b6e492afc07249a4a5c49/sdks/python/apache_beam/pipeline.py#L964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: Dataflow worker container resolved to legacy runner label if not explicitly disable/enable runner v2 in 2.54.0+.dev [beam]
Abacn commented on issue #30634: URL: https://github.com/apache/beam/issues/30634#issuecomment-2007838586 This also affects https://github.com/apache/beam/blob/a3e5ac86eeade9fbef391a2c19d67825335938e6/it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/dataflow/AbstractPipelineLauncher.java#L261-L267 the throughput metrics are not correctly reported in IOLoadTests for read pipelines -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The PostCommit Python ValidatesRunner Samza job is flaky [beam]
Abacn commented on issue #30657: URL: https://github.com/apache/beam/issues/30657#issuecomment-2007926950 This is due to https://github.com/apache/beam/pull/30587#issuecomment-2004812901 CC: @robertwb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Minimize scope of expensive lock [beam]
github-actions[bot] commented on PR #30679: URL: https://github.com/apache/beam/pull/30679#issuecomment-2008098106 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]
MelodyShen commented on code in PR #30439: URL: https://github.com/apache/beam/pull/30439#discussion_r1531099531 ## runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java: ## @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.util; + +import static org.hamcrest.Matchers.greaterThan; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertThat; +import static org.junit.Assert.assertTrue; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */ +@RunWith(JUnit4.class) +// TODO(https://github.com/apache/beam/issues/21230): Remove when new version of errorprone is +// released (2.11.0) +@SuppressWarnings("unused") +public class BoundedQueueExecutorTest { + @Rule public transient Timeout globalTimeout = Timeout.seconds(300); + private static final long MAXIMUM_BYTES_OUTSTANDING = 1000; + private static final int DEFAULT_MAX_THREADS = 2; + private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60; + + private BoundedQueueExecutor executor; + + private Runnable createSleepProcessWorkFn(CountDownLatch start, CountDownLatch stop) { +Runnable runnable = +() -> { + start.countDown(); + try { +stop.await(); + } catch (Exception e) { +throw new RuntimeException(e); + } +}; +return runnable; + } + + @Before + public void setUp() { +this.executor = +new BoundedQueueExecutor( +DEFAULT_MAX_THREADS, +DEFAULT_THREAD_EXPIRATION_SEC, +TimeUnit.SECONDS, +DEFAULT_MAX_THREADS + 100, +MAXIMUM_BYTES_OUTSTANDING, +new ThreadFactoryBuilder() +.setNameFormat("DataflowWorkUnits-%d") +.setDaemon(true) +.build()); + } + + @Test + public void testScheduleWork() throws Exception { +CountDownLatch processStart1 = new CountDownLatch(1); +CountDownLatch processStop1 = new CountDownLatch(1); +CountDownLatch processStart2 = new CountDownLatch(1); +CountDownLatch processStop2 = new CountDownLatch(1); +CountDownLatch processStart3 = new CountDownLatch(1); +CountDownLatch processStop3 = new CountDownLatch(1); +Runnable m1 = createSleepProcessWorkFn(processStart1, processStop1); +Runnable m2 = createSleepProcessWorkFn(processStart2, processStop2); +Runnable m3 = createSleepProcessWorkFn(processStart3, processStop3); + +executor.execute(m1, 1); +assertTrue(processStart1.await(1000, TimeUnit.MILLISECONDS)); Review Comment: Cool! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]
MelodyShen commented on code in PR #30439: URL: https://github.com/apache/beam/pull/30439#discussion_r1531100277 ## runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java: ## @@ -59,8 +72,8 @@ public BoundedQueueExecutor( @Override protected void beforeExecute(Thread t, Runnable r) { super.beforeExecute(t, r); -synchronized (this) { - if (activeCount.getAndIncrement() >= maximumPoolSize - 1) { +synchronized (BoundedQueueExecutor.this) { + if (activeCount++ >= maximumThreadCount - 1 && startTimeMaxActiveThreadsUsed == 0) { Review Comment: Sure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] replace clock.milliseconds with stopwatch [beam]
clmccart commented on PR #30678: URL: https://github.com/apache/beam/pull/30678#issuecomment-2007781631 cc: @tudorm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]
kennknowles commented on code in PR #30545: URL: https://github.com/apache/beam/pull/30545#discussion_r1531103041 ## model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto: ## @@ -813,6 +813,10 @@ message GroupIntoBatchesPayload { int64 max_buffering_duration_millis = 2; } +message RedistributePayload { Review Comment: @robertwb pinging you on this very thrilling addition to the protos -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update Python Dependencies from BRANCH weekly_update_python_dependencies_1708820977 [beam]
damondouglas closed pull request #30412: Update Python Dependencies from BRANCH weekly_update_python_dependencies_1708820977 URL: https://github.com/apache/beam/pull/30412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add BigTableIO Stress test [beam]
akashorabek commented on PR #30630: URL: https://github.com/apache/beam/pull/30630#issuecomment-2008682755 Run Kotlin_Examples PreCommit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Seeking guidance on new runner implementation [beam]
cisaacstern commented on issue #30675: URL: https://github.com/apache/beam/issues/30675#issuecomment-2008380280 @moradology! Over in my WIP for DaskRunner SideInputs, I've got this: https://github.com/apache/beam/pull/27618/files#diff-bfb5ae715e9067778f492058e8a02ff877d6e7584624908ddbdd316853e6befbR173-R182 (Which is heavily modeled on the RayRunner.) I actually don't remember off the top of my head what part of that surfaces the `As*` typing, but the tests pass over there so I think it's being captured somehow! With a little more time I could set up my dev environment and introspect a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Disable unsupported custom window type test on samza and spark. [beam]
Abacn merged PR #30680: URL: https://github.com/apache/beam/pull/30680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The PostCommit Python ValidatesRunner Samza job is flaky [beam]
Abacn closed issue #30657: The PostCommit Python ValidatesRunner Samza job is flaky URL: https://github.com/apache/beam/issues/30657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add BigTableIO Stress test [beam]
akashorabek commented on code in PR #30630: URL: https://github.com/apache/beam/pull/30630#discussion_r1531474427 ## it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/IOStressTestBase.java: ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.it.gcp; + +import com.google.cloud.Timestamp; +import java.io.IOException; +import java.io.Serializable; +import java.text.ParseException; +import java.time.Duration; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import java.util.UUID; +import org.apache.beam.it.common.PipelineLauncher; +import org.apache.beam.sdk.testutils.NamedTestResult; +import org.apache.beam.sdk.testutils.metrics.IOITMetrics; +import org.apache.beam.sdk.testutils.publishing.InfluxDBSettings; +import org.apache.beam.sdk.transforms.DoFn; +import org.joda.time.Instant; + +/** Base class for IO Stress tests. */ +public class IOStressTestBase extends IOLoadTestBase { + /** + * The load will initiate at 1x, progressively increase to 2x and 4x, then decrease to 2x and + * eventually return to 1x. + */ + protected static final int[] DEFAULT_LOAD_INCREASE_ARRAY = {1, 2, 2, 4, 2, 1}; + + protected static final int DEFAULT_ROWS_PER_SECOND = 1000; + + /** + * Generates and returns a list of LoadPeriod instances representing periods of load increase + * based on the specified load increase array and total duration in minutes. + * + * @param minutesTotal The total duration in minutes for which the load periods are generated. + * @return A list of LoadPeriod instances defining periods of load increase. + */ + protected List getLoadPeriods(int minutesTotal, int[] loadIncreaseArray) { + +List loadPeriods = new ArrayList<>(); +long periodDurationMillis = +Duration.ofMinutes(minutesTotal / loadIncreaseArray.length).toMillis(); +long startTimeMillis = 0; + +for (int loadIncreaseMultiplier : loadIncreaseArray) { + long endTimeMillis = startTimeMillis + periodDurationMillis; + loadPeriods.add(new LoadPeriod(loadIncreaseMultiplier, startTimeMillis, endTimeMillis)); + + startTimeMillis = endTimeMillis; +} +return loadPeriods; + } + + /** + * Represents a period of time with associated load increase properties for stress testing + * scenarios. + */ + protected static class LoadPeriod implements Serializable { +private final int loadIncreaseMultiplier; +private final long periodStartMillis; +private final long periodEndMillis; + +public LoadPeriod(int loadIncreaseMultiplier, long periodStartMillis, long periodEndMin) { + this.loadIncreaseMultiplier = loadIncreaseMultiplier; + this.periodStartMillis = periodStartMillis; + this.periodEndMillis = periodEndMin; +} + +public int getLoadIncreaseMultiplier() { + return loadIncreaseMultiplier; +} + +public long getPeriodStartMillis() { + return periodStartMillis; +} + +public long getPeriodEndMillis() { + return periodEndMillis; +} + } + + /** + * Custom Apache Beam DoFn designed for use in stress testing scenarios. It introduces a dynamic + * load increase over time, multiplying the input elements based on the elapsed time since the + * start of processing. This class aims to simulate various load levels during stress testing. + */ + protected static class MultiplierDoFn extends DoFn { +private final int startMultiplier; +private final long startTimesMillis; +private final List loadPeriods; + +public MultiplierDoFn(int startMultiplier, List loadPeriods) { + this.startMultiplier = startMultiplier; + this.startTimesMillis = Instant.now().getMillis(); + this.loadPeriods = loadPeriods; +} + +@DoFn.ProcessElement +public void processElement( +@Element T element, OutputReceiver outputReceiver, @DoFn.Timestamp Instant timestamp) { + + int multiplier = this.startMultiplier; + long elapsedTimeMillis = timestamp.getMillis() - startTimesMillis; + + for (LoadPeriod loadPeriod : loadPeriods) { +if (elapsedTimeMillis >= loadPeriod.getPeriodStartMillis()
[PR] Bump github.com/docker/docker from 25.0.3+incompatible to 25.0.5+incompatible in /sdks [beam]
dependabot[bot] opened a new pull request, #30684: URL: https://github.com/apache/beam/pull/30684 Bumps [github.com/docker/docker](https://github.com/docker/docker) from 25.0.3+incompatible to 25.0.5+incompatible. Release notes Sourced from https://github.com/docker/docker/releases;>github.com/docker/docker's releases. 25.0.5 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: https://github.com/docker/cli/issues?q=is%3Aclosed+milestone%3A25.0.5;>docker/cli, 25.0.5 milestone https://github.com/moby/moby/issues?q=is%3Aclosed+milestone%3A25.0.5;>moby/moby, 25.0.5 milestone Deprecated and removed features, see https://github.com/docker/cli/blob/v25.0.5/docs/deprecated.md;>Deprecated Features. Changes to the Engine API, see https://github.com/moby/moby/blob/v25.0.5/docs/api/version-history.md;>API version history. Security This release contains a security fix for https://github.com/moby/moby/security/advisories/GHSA-mq39-4gv4-mvpx;>CVE-2024-29018, a potential data exfiltration from 'internal' networks via authoritative DNS servers. Bug fixes and enhancements https://github.com/moby/moby/security/advisories/GHSA-mq39-4gv4-mvpx;>CVE-2024-29018: Do not forward requests to external DNS servers for a container that is only connected to an 'internal' network. Previously, requests were forwarded if the host's DNS server was running on a loopback address, like systemd's 127.0.0.53. https://redirect.github.com/moby/moby/pull/47589;>moby/moby#47589 plugin: fix mounting /etc/hosts when running in UserNS. https://redirect.github.com/moby/moby/pull/47588;>moby/moby#47588 rootless: fix open /etc/docker/plugins: permission denied. https://redirect.github.com/moby/moby/pull/47587;>moby/moby#47587 Fix multiple parallel docker build runs leaking disk space. https://redirect.github.com/moby/moby/pull/47527;>moby/moby#47527 v25.0.4 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: https://github.com/docker/cli/issues?q=is%3Aclosed+milestone%3A25.0.4;>docker/cli, 25.0.4 milestone https://github.com/moby/moby/issues?q=is%3Aclosed+milestone%3A25.0.4;>moby/moby, 25.0.4 milestone Deprecated and removed features, see https://github.com/docker/cli/blob/v25.0.4/docs/deprecated.md;>Deprecated Features. Changes to the Engine API, see https://github.com/moby/moby/blob/v25.0.4/docs/api/version-history.md;>API version history. Bug fixes and enhancements Restore DNS names for containers in the default nat network on Windows. https://redirect.github.com/moby/moby/pull/47490;>moby/moby#47490 Fix docker start failing when used with --checkpoint https://redirect.github.com/moby/moby/pull/47466;>moby/moby#47466 Don't enforce new validation rules for existing swarm networks https://redirect.github.com/moby/moby/pull/47482;>moby/moby#47482 Restore IP connectivity between the host and containers on an internal bridge network. https://redirect.github.com/moby/moby/pull/47481;>moby/moby#47481 Fix a regression introduced in v25.0 that prevented the classic builder from ADDing a tar archive with xattrs created on a non-Linux OS https://redirect.github.com/moby/moby/pull/47483;>moby/moby#47483 containerd image store: Fix image pull not emitting Pulling fs layer status https://redirect.github.com/moby/moby/pull/47484;>moby/moby#47484 API To preserve backwards compatibility, make read-only mounts not recursive by default when using older clients (API version v1.44). https://redirect.github.com/moby/moby/pull/47393;>moby/moby#47393 GET /images/{id}/json omits the Created field (previously it was 0001-01-01T00:00:00Z) if the Created field is missing from the image config. https://redirect.github.com/moby/moby/pull/47451;>moby/moby#47451 Populate a missing Created field in GET /images/{id}/json with 0001-01-01T00:00:00Z for API version = 1.43. https://redirect.github.com/moby/moby/pull/47387;>moby/moby#47387 Fix a regression that caused API socket connection failures to report an API version negotiation failure instead. https://redirect.github.com/moby/moby/pull/47470;>moby/moby#47470 Preserve supplied endpoint configuration in a container-create API request, when a container-wide MAC address is specified, but NetworkMode name-or-id is not the same as the name-or-id used in NetworkSettings.Networks. https://redirect.github.com/moby/moby/pull/47510;>moby/moby#47510 Packaging updates Upgrade Go runtime to https://go.dev/doc/devel/release#go1.21.8;>1.21.8. https://redirect.github.com/moby/moby/pull/47503;>moby/moby#47503 Upgrade RootlessKit to https://github.com/rootless-containers/rootlesskit/releases/tag/v2.0.2;>v2.0.2. https://redirect.github.com/moby/moby/pull/47508;>moby/moby#47508 Upgrade Compose to
[PR] Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 in /sdks [beam]
dependabot[bot] opened a new pull request, #30685: URL: https://github.com/apache/beam/pull/30685 Bumps [golang.org/x/oauth2](https://github.com/golang/oauth2) from 0.17.0 to 0.18.0. Commits https://github.com/golang/oauth2/commit/85231f99d65eedc833c8fccfec7fd7d8303c0d3e;>85231f9 go.mod: update golang.org/x dependencies https://github.com/golang/oauth2/commit/34a7afaa8571b555a177d9bf0360276cbb94f630;>34a7afa google/externalaccount: add Config.UniverseDomain https://github.com/golang/oauth2/commit/95bec9538152e03de0cfbaf64cd3af163b8cef30;>95bec95 google/externalaccount: moves externalaccount package out of internal and exp... See full diff in https://github.com/golang/oauth2/compare/v0.17.0...v0.18.0;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/oauth2=go_modules=0.17.0=0.18.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 in /sdks [beam]
github-actions[bot] commented on PR #30685: URL: https://github.com/apache/beam/pull/30685#issuecomment-2008666471 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @riteshghorse for label go. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/docker/docker from 25.0.3+incompatible to 25.0.5+incompatible in /sdks [beam]
github-actions[bot] commented on PR #30684: URL: https://github.com/apache/beam/pull/30684#issuecomment-2008666508 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @jrmccluskey for label go. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1531296538 ## model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto: ## Review Comment: Or are we going to piggy-back on multimap for this? (If so we should delete the TODO.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] The PostCommit Python ValidatesRunner Samza job is flaky [beam]
github-actions[bot] commented on issue #30657: URL: https://github.com/apache/beam/issues/30657#issuecomment-2008452865 Reopening since the workflow is still flaky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] The PreCommit Java job is flaky [beam]
github-actions[bot] opened a new issue, #30683: URL: https://github.com/apache/beam/issues/30683 The PreCommit Java is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PreCommit_Java.yml?query=is%3Afailure+branch%3Amaster to see the logs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/docker/docker from 25.0.3+incompatible to 25.0.5+incompatible in /sdks [beam]
codecov[bot] commented on PR #30684: URL: https://github.com/apache/beam/pull/30684#issuecomment-2008651036 ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/30684?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 38.53%. Comparing base [(`389e106`)](https://app.codecov.io/gh/apache/beam/commit/389e1067c9d1f9bcc99b338ffb46e4923f692cae?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) to head [(`38a0341`)](https://app.codecov.io/gh/apache/beam/pull/30684?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #30684 +/- ## === Coverage 38.53% 38.53% === Files 698 698 Lines 102360 102360 === Hits3944439444 Misses 6128461284 Partials 1632 1632 ``` | [Flag](https://app.codecov.io/gh/apache/beam/pull/30684/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Coverage Δ | | |---|---|---| | [go](https://app.codecov.io/gh/apache/beam/pull/30684/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | `54.33% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more. [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/beam/pull/30684?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 in /sdks [beam]
codecov[bot] commented on PR #30685: URL: https://github.com/apache/beam/pull/30685#issuecomment-2008651162 ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/30685?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 38.53%. Comparing base [(`389e106`)](https://app.codecov.io/gh/apache/beam/commit/389e1067c9d1f9bcc99b338ffb46e4923f692cae?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) to head [(`d9ff2ea`)](https://app.codecov.io/gh/apache/beam/pull/30685?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). Additional details and impacted files ```diff @@Coverage Diff @@ ## master #30685 +/- ## == - Coverage 38.53% 38.53% -0.01% == Files 698 698 Lines 102360 102360 == - Hits3944439443 -1 - Misses 6128461285 +1 Partials 1632 1632 ``` | [Flag](https://app.codecov.io/gh/apache/beam/pull/30685/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Coverage Δ | | |---|---|---| | [go](https://app.codecov.io/gh/apache/beam/pull/30685/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | `54.33% <ø> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more. [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/beam/pull/30685?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add BigTableIO Stress test [beam]
akashorabek commented on PR #30630: URL: https://github.com/apache/beam/pull/30630#issuecomment-2008651373 Run Java_Examples_Dataflow PreCommit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1531294554 ## model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto: ## Review Comment: Do we also need to add something here: https://github.com/apache/beam/blob/fb7ba65e2236f3dd871b6e492afc07249a4a5c49/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L478 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Python] Add redis client to python dependencies [beam]
riteshghorse opened a new pull request, #30677: URL: https://github.com/apache/beam/pull/30677 With the addition of redis cache support to the Enrichment transform, it will be good to package redis client dependency with apache-beam package. Otherwise user will have to install it separately every time. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add ExternalTransformProvider example [beam]
ahmedabu98 commented on code in PR #30666: URL: https://github.com/apache/beam/pull/30666#discussion_r1530961429 ## examples/multi-language/python/wordcount_external.py: ## @@ -0,0 +1,102 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import logging + +import apache_beam as beam +from apache_beam.io import ReadFromText +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.transforms.external_transform_provider import ExternalTransformProvider +from apache_beam.typehints.row_type import RowTypeConstraint +"""A Python multi-language pipeline that counts words. + +This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in +`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically +provided in Python via the `ExternalTransformProvider` API. + +Example commands for executing this program: + +DirectRunner: +$ python wordcount_external.py --runner DirectRunner --input --output --expansion_service_port + +DataflowRunner: +$ python wordcount_external.py \ + --runner DataflowRunner \ + --temp_location $TEMP_LOCATION \ + --project $GCP_PROJECT \ + --region $GCP_REGION \ + --job_name $JOB_NAME \ + --num_workers $NUM_WORKERS \ + --input "gs://dataflow-samples/shakespeare/kinglear.txt" \ + --output "gs://$GCS_BUCKET/wordcount_external/output" \ + --expansion_service_port Review Comment: There's a common section in the [README.md](https://github.com/apache/beam/blob/master/examples/multi-language/README.md#instructions-for-running-the-pipelines) file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Disable unsupported custom window type test on samza and spark. [beam]
robertwb commented on PR #30680: URL: https://github.com/apache/beam/pull/30680#issuecomment-2007996651 R: @Abacn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Disable unsupported custom window type test on samza and spark. [beam]
github-actions[bot] commented on PR #30680: URL: https://github.com/apache/beam/pull/30680#issuecomment-2007999326 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]
damccorm merged PR #30674: URL: https://github.com/apache/beam/pull/30674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org