Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]

2024-03-19 Thread via GitHub


dmitryor commented on PR #30672:
URL: https://github.com/apache/beam/pull/30672#issuecomment-2006005444

   R: @scwhittle 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]

2024-03-19 Thread via GitHub


dmitryor opened a new pull request, #30672:
URL: https://github.com/apache/beam/pull/30672

   Existing code incorrectly assumes a per-entry HashMap overhead is 16 bytes.
   
   In reality it's 32 bytes [per this 
article](https://appsintheopen.com/posts/52-the-memory-overhead-of-java-ojects),
 confirmed by profiling: 1.22GB for 40.96M objects ~ 32 bytes/object
   
   
![313862428-9054f0e9-f0c0-4774-85b8-84fa3dd397b3](https://github.com/apache/beam/assets/34167644/0538a3a5-fb68-443e-b334-4d4342c195e4)
   
![313862425-15f2123b-a839-4339-aa7a-4710d04c9552](https://github.com/apache/beam/assets/34167644/2352394c-6f8e-4e3a-977a-884fd6ff2d2a)
   
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30672:
URL: https://github.com/apache/beam/pull/30672#issuecomment-2006009059

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]

2024-03-19 Thread via GitHub


scwhittle commented on PR #30672:
URL: https://github.com/apache/beam/pull/30672#issuecomment-2006346607

   > Task :runners:google-cloud-dataflow-java:worker:test
   
   
org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateCacheTest > 
testBasic FAILED
   java.lang.AssertionError at WindmillStateCacheTest.java:171
   
   
org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateCacheTest > 
testStaleWorkItem FAILED
   java.lang.AssertionError at WindmillStateCacheTest.java:257
   
   
org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateCacheTest > 
testInvalidation FAILED
   java.lang.AssertionError at WindmillStateCacheTest.java:215
   
   
   some test failures and spotless needs to be fixed
   
   ./gradlew :runners:google-cloud-dataflow-java:worker:spotlessApply


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] add pythonpath to cloudcoverage tox environment [beam]

2024-03-19 Thread via GitHub


volatilemolotov opened a new pull request, #30673:
URL: https://github.com/apache/beam/pull/30673

   Adds PYTHONPATH environment variable to cloudcoverage tox environment set to 
toxinidir (sdks/python). 
   This allows pytest to load all the modules and run the correct coverage 
report
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: PostCommit Java SingleStoreIO IT failing [beam]

2024-03-19 Thread via GitHub


AdalbertMemSQL commented on issue #30564:
URL: https://github.com/apache/beam/issues/30564#issuecomment-2006636094

   Hey @Abacn 
   Is it possible to somehow retrieve full workload logs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Feature Request]: Refresh side input from BigQuery [beam]

2024-03-19 Thread via GitHub


BostjanBozic commented on issue #26196:
URL: https://github.com/apache/beam/issues/26196#issuecomment-2006766022

   Just a question - if you use global window (since Pub/Sub would be unbounded 
source), would `PeriodicImpulse` come into play or would you need to use 
`GenerateSequence` for that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add flag for direct path that reads from system properties [beam]

2024-03-19 Thread via GitHub


scwhittle commented on code in PR #30588:
URL: https://github.com/apache/beam/pull/30588#discussion_r1530507529


##
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowStreamingPipelineOptions.java:
##
@@ -211,6 +211,14 @@ public interface DataflowStreamingPipelineOptions extends 
PipelineOptions {
 
   void setWindmillServiceStreamMaxBackoffMillis(int value);
 
+  @Description(
+  "If true, Dataflow streaming pipeline will be running in direct path 
mode."
+  + " VMs must have IPv6 enabled for this to work.")
+  @Default.Boolean(false)
+  boolean getIsWindmillServiceDirectPathEnabled();
+
+  void setIsWindmillServiceDirectPathEnabled(boolean 
isWindmillServiceDirectPathEnabled);

Review Comment:
   I would remove the `is`, otherwise specifying the flag looks odd IMO
   `--isWindmillServiceDirectPathEnabled=true`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.4 to 1.17.8 in /sdks [beam]

2024-03-19 Thread via GitHub


lostluck commented on PR #30669:
URL: https://github.com/apache/beam/pull/30669#issuecomment-2007677567

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.4 to 1.17.8 in /sdks [beam]

2024-03-19 Thread via GitHub


lostluck merged PR #30669:
URL: https://github.com/apache/beam/pull/30669


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add BigTableIO Stress test [beam]

2024-03-19 Thread via GitHub


Abacn commented on code in PR #30630:
URL: https://github.com/apache/beam/pull/30630#discussion_r1530769586


##
it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/IOStressTestBase.java:
##
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.it.gcp;
+
+import com.google.cloud.Timestamp;
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.ParseException;
+import java.time.Duration;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+import org.apache.beam.it.common.PipelineLauncher;
+import org.apache.beam.sdk.testutils.NamedTestResult;
+import org.apache.beam.sdk.testutils.metrics.IOITMetrics;
+import org.apache.beam.sdk.testutils.publishing.InfluxDBSettings;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.joda.time.Instant;
+
+/** Base class for IO Stress tests. */
+public class IOStressTestBase extends IOLoadTestBase {
+  /**
+   * The load will initiate at 1x, progressively increase to 2x and 4x, then 
decrease to 2x and
+   * eventually return to 1x.
+   */
+  protected static final int[] DEFAULT_LOAD_INCREASE_ARRAY = {1, 2, 2, 4, 2, 
1};
+
+  protected static final int DEFAULT_ROWS_PER_SECOND = 1000;
+
+  /**
+   * Generates and returns a list of LoadPeriod instances representing periods 
of load increase
+   * based on the specified load increase array and total duration in minutes.
+   *
+   * @param minutesTotal The total duration in minutes for which the load 
periods are generated.
+   * @return A list of LoadPeriod instances defining periods of load increase.
+   */
+  protected List getLoadPeriods(int minutesTotal, int[] 
loadIncreaseArray) {
+
+List loadPeriods = new ArrayList<>();
+long periodDurationMillis =
+Duration.ofMinutes(minutesTotal / loadIncreaseArray.length).toMillis();
+long startTimeMillis = 0;
+
+for (int loadIncreaseMultiplier : loadIncreaseArray) {
+  long endTimeMillis = startTimeMillis + periodDurationMillis;
+  loadPeriods.add(new LoadPeriod(loadIncreaseMultiplier, startTimeMillis, 
endTimeMillis));
+
+  startTimeMillis = endTimeMillis;
+}
+return loadPeriods;
+  }
+
+  /**
+   * Represents a period of time with associated load increase properties for 
stress testing
+   * scenarios.
+   */
+  protected static class LoadPeriod implements Serializable {
+private final int loadIncreaseMultiplier;
+private final long periodStartMillis;
+private final long periodEndMillis;
+
+public LoadPeriod(int loadIncreaseMultiplier, long periodStartMillis, long 
periodEndMin) {
+  this.loadIncreaseMultiplier = loadIncreaseMultiplier;
+  this.periodStartMillis = periodStartMillis;
+  this.periodEndMillis = periodEndMin;
+}
+
+public int getLoadIncreaseMultiplier() {
+  return loadIncreaseMultiplier;
+}
+
+public long getPeriodStartMillis() {
+  return periodStartMillis;
+}
+
+public long getPeriodEndMillis() {
+  return periodEndMillis;
+}
+  }
+
+  /**
+   * Custom Apache Beam DoFn designed for use in stress testing scenarios. It 
introduces a dynamic
+   * load increase over time, multiplying the input elements based on the 
elapsed time since the
+   * start of processing. This class aims to simulate various load levels 
during stress testing.
+   */
+  protected static class MultiplierDoFn extends DoFn {
+private final int startMultiplier;
+private final long startTimesMillis;
+private final List loadPeriods;
+
+public MultiplierDoFn(int startMultiplier, List loadPeriods) {
+  this.startMultiplier = startMultiplier;
+  this.startTimesMillis = Instant.now().getMillis();
+  this.loadPeriods = loadPeriods;
+}
+
+@DoFn.ProcessElement
+public void processElement(
+@Element T element, OutputReceiver outputReceiver, @DoFn.Timestamp 
Instant timestamp) {
+
+  int multiplier = this.startMultiplier;
+  long elapsedTimeMillis = timestamp.getMillis() - startTimesMillis;
+
+  for (LoadPeriod loadPeriod : loadPeriods) {
+if (elapsedTimeMillis >= loadPeriod.getPeriodStartMillis()
+  

Re: [PR] Add config validation to kafka read schema transform [beam]

2024-03-19 Thread via GitHub


ahmedabu98 commented on code in PR #30625:
URL: https://github.com/apache/beam/pull/30625#discussion_r1530772951


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaReadSchemaTransformConfiguration.java:
##
@@ -59,20 +63,25 @@ public void validate() {
 final String confluentSchemaRegSubject = 
this.getConfluentSchemaRegistrySubject();
 
 if (confluentSchemaRegUrl != null) {
-  assert confluentSchemaRegSubject != null
-  : "To read from Kafka, a schema must be provided directly or though 
Confluent "
-  + "Schema Registry. Make sure you are providing one of these 
parameters.";
+  checkArgument(
+  confluentSchemaRegSubject != null,
+  "To read from Kafka, a schema must be provided directly or though 
Confluent "
+  + "Schema Registry. Make sure you are providing one of these 
parameters.");
 } else if (dataFormat != null && dataFormat.equals("RAW")) {
-  assert inputSchema == null : "To read from Kafka in RAW format, you 
can't provide a schema.";
+  checkArgument(
+  inputSchema == null, "To read from Kafka in RAW format, you can't 
provide a schema.");
 } else if (dataFormat != null && dataFormat.equals("JSON")) {
-  assert inputSchema != null : "To read from Kafka in JSON format, you 
must provide a schema.";
+  checkArgument(
+  inputSchema != null, "To read from Kafka in JSON format, you must 
provide a schema.");
 } else if (dataFormat != null && dataFormat.equals("PROTO")) {
-  assert messageName != null
-  : "To read from Kafka in PROTO format, messageName must be 
provided.";
-  assert fileDescriptorPath != null || inputSchema != null
-  : "To read from Kafka in PROTO format, fileDescriptorPath or schema 
must be provided.";
+  checkArgument(
+  messageName != null, "To read from Kafka in PROTO format, 
messageName must be provided.");
+  checkArgument(
+  fileDescriptorPath != null || inputSchema != null,
+  "To read from Kafka in PROTO format, fileDescriptorPath or schema 
must be provided.");
 } else {
-  assert inputSchema != null : "To read from Kafka in AVRO format, you 
must provide a schema.";
+  checkArgument(
+  inputSchema != null, "To read from Kafka in AVRO format, you must 
provide a schema.");

Review Comment:
   nit: for many of these, checkNotNull() would be more appropriate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Performance Regression or Improvement: cogbk_python_batch_load_test_2GB_of_100B_records_with_a_multiple_key:runtime [beam]

2024-03-19 Thread via GitHub


liferoad closed issue #30659: 
  Performance Regression or Improvement: 
cogbk_python_batch_load_test_2GB_of_100B_records_with_a_multiple_key:runtime

URL: https://github.com/apache/beam/issues/30659


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_fanout_4_times_with_2GB_10byte_records_total:runtime [beam]

2024-03-19 Thread via GitHub


liferoad closed issue #30658: 
  Performance Regression or Improvement: 
gbk_python_batch_load_test_fanout_4_times_with_2GB_10byte_records_total:runtime

URL: https://github.com/apache/beam/issues/30658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]

2024-03-19 Thread via GitHub


Abacn commented on PR #30655:
URL: https://github.com/apache/beam/pull/30655#issuecomment-2007343641

   Can we change the cron string to sth like '50 5 * * *' so the risk of 
interfering with other workflow is minimum
   
   
https://github.com/apache/beam/blob/50f33cd786dc63463688315a1c73b1cf4ef18807/.github/workflows/beam_LoadTests_Python_Combine_Dataflow_Streaming.yml#L20


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add config validation to kafka read schema transform [beam]

2024-03-19 Thread via GitHub


Polber commented on code in PR #30625:
URL: https://github.com/apache/beam/pull/30625#discussion_r1530776385


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaReadSchemaTransformConfiguration.java:
##
@@ -59,20 +63,25 @@ public void validate() {
 final String confluentSchemaRegSubject = 
this.getConfluentSchemaRegistrySubject();
 
 if (confluentSchemaRegUrl != null) {
-  assert confluentSchemaRegSubject != null
-  : "To read from Kafka, a schema must be provided directly or though 
Confluent "
-  + "Schema Registry. Make sure you are providing one of these 
parameters.";
+  checkArgument(
+  confluentSchemaRegSubject != null,
+  "To read from Kafka, a schema must be provided directly or though 
Confluent "
+  + "Schema Registry. Make sure you are providing one of these 
parameters.");
 } else if (dataFormat != null && dataFormat.equals("RAW")) {
-  assert inputSchema == null : "To read from Kafka in RAW format, you 
can't provide a schema.";
+  checkArgument(
+  inputSchema == null, "To read from Kafka in RAW format, you can't 
provide a schema.");
 } else if (dataFormat != null && dataFormat.equals("JSON")) {
-  assert inputSchema != null : "To read from Kafka in JSON format, you 
must provide a schema.";
+  checkArgument(
+  inputSchema != null, "To read from Kafka in JSON format, you must 
provide a schema.");
 } else if (dataFormat != null && dataFormat.equals("PROTO")) {
-  assert messageName != null
-  : "To read from Kafka in PROTO format, messageName must be 
provided.";
-  assert fileDescriptorPath != null || inputSchema != null
-  : "To read from Kafka in PROTO format, fileDescriptorPath or schema 
must be provided.";
+  checkArgument(
+  messageName != null, "To read from Kafka in PROTO format, 
messageName must be provided.");
+  checkArgument(
+  fileDescriptorPath != null || inputSchema != null,
+  "To read from Kafka in PROTO format, fileDescriptorPath or schema 
must be provided.");
 } else {
-  assert inputSchema != null : "To read from Kafka in AVRO format, you 
must provide a schema.";
+  checkArgument(
+  inputSchema != null, "To read from Kafka in AVRO format, you must 
provide a schema.");

Review Comment:
   Good point. Let me refactor that and push before emrging



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]

2024-03-19 Thread via GitHub


Abacn opened a new pull request, #30674:
URL: https://github.com/apache/beam/pull/30674

   This reverts commit a3ea9ef706cf798fc1f6b026dcdf7171434e74d8.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]

2024-03-19 Thread via GitHub


Abacn commented on PR #30674:
URL: https://github.com/apache/beam/pull/30674#issuecomment-2007649109

   Cache removed : 
https://issues.apache.org/jira/projects/INFRA/issues/INFRA-25595?filter=allopenissues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add a way for channels to be closed manually [beam]

2024-03-19 Thread via GitHub


scwhittle merged PR #30425:
URL: https://github.com/apache/beam/pull/30425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add config validation to kafka read schema transform [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30625:
URL: https://github.com/apache/beam/pull/30625#issuecomment-2007582202

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @damondouglas for label java.
   R: @ahmedabu98 for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/google/uuid from 1.5.0 to 1.6.0 in /sdks [beam]

2024-03-19 Thread via GitHub


dependabot[bot] commented on PR #30254:
URL: https://github.com/apache/beam/pull/30254#issuecomment-2007685637

   Looks like github.com/google/uuid is up-to-date now, so this is no longer 
needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/google/uuid from 1.5.0 to 1.6.0 in /sdks [beam]

2024-03-19 Thread via GitHub


dependabot[bot] closed pull request #30254: Bump github.com/google/uuid from 
1.5.0 to 1.6.0 in /sdks
URL: https://github.com/apache/beam/pull/30254


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks [beam]

2024-03-19 Thread via GitHub


lostluck commented on PR #30557:
URL: https://github.com/apache/beam/pull/30557#issuecomment-2007686397

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Check feature store existence at construction time [beam]

2024-03-19 Thread via GitHub


damccorm commented on PR #30668:
URL: https://github.com/apache/beam/pull/30668#issuecomment-2007237814

   3.10 failure looks like an unrelated GameStatsIT failure. So this should be 
safe to move forward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]

2024-03-19 Thread via GitHub


Abacn merged PR #30655:
URL: https://github.com/apache/beam/pull/30655


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: beam_LoadTests_Python_Combine_Dataflow_Streaming failing [beam]

2024-03-19 Thread via GitHub


Abacn closed issue #23904: [Bug]: 
beam_LoadTests_Python_Combine_Dataflow_Streaming failing
URL: https://github.com/apache/beam/issues/23904


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]

2024-03-19 Thread via GitHub


Abacn commented on PR #30655:
URL: https://github.com/apache/beam/pull/30655#issuecomment-2007640750

   merge now and let's see how it goes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] update confluent dependency version to 7.6.0 [beam]

2024-03-19 Thread via GitHub


Abacn merged PR #30638:
URL: https://github.com/apache/beam/pull/30638


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Task]: Update confluent libraries versions from 5.3.2 to more recent in kafka io extension [beam]

2024-03-19 Thread via GitHub


Abacn closed issue #30610: [Task]: Update confluent libraries versions from 
5.3.2 to more recent in kafka io extension
URL: https://github.com/apache/beam/issues/30610


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]

2024-03-19 Thread via GitHub


Abacn merged PR #30662:
URL: https://github.com/apache/beam/pull/30662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Use autovalue's @Memoized in ExponentialBuckets [beam]

2024-03-19 Thread via GitHub


JayajP opened a new pull request, #30676:
URL: https://github.com/apache/beam/pull/30676

   Based on [autovalue's 
documentation](https://github.com/google/auto/blob/main/value/userguide/howto.md#-memoize-cache-derived-properties).
 
   
   For `ExponentialBuckets`, the `Base`, `InvLog2GrowthFactor`, and `RangeTo` 
are all derived from the input arguments `NumBuckets` and `Scale`. Currently 
these three values are computed for every instance of ExponentialHistogram, and 
they are used in the `hashCode` and `equalsTo` method for this class.
   
   With this change, the derived values will only be computed if an instance 
needs them, but derived values will only be computed once. Additionally, the 
derived values will not be used in the `equalsTo` or `hashCode` method making 
these methods more efficient. 
   
   This PR also memoizes the `hashCode` of `ExponentialBuckets`, which is 
helpful because we compute this every time we record a histogram value. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] Seeking guidance on new runner implementation [beam]

2024-03-19 Thread via GitHub


moradology opened a new issue, #30675:
URL: https://github.com/apache/beam/issues/30675

   OK, so I'm very interested in making the batch subset of apache beam play 
nicely with EMR-Serverless. Unfortunately, this is difficult to pull off with 
the portable runner - perhaps impossible even - as there is an assumption so 
far as I can tell that the spark master UI be available to take work from the 
beam's job runner. To that end, I've begun adapting roughly the strategy found 
in the dask runner in the python SDK to build up pyspark RDDs that are 
submitted directly via whatever `SparkSession` pyspark finds at runtime. So 
far, so good. I even have a (partial) implementation of support for side inputs!
   
   Unfortunately, here, I am running into some difficulties and would love to 
get some feedback on whatever it is that I might be missing. As runner authors 
will surely be aware, it is necessary to distinguish between `AsIter` and 
`AsSingleton` `AsSideInput` instances. Fair enough, but by the time I am 
traversing `AppliedPTransform` instances to evaluate,  that information appears 
to be gone. Perhaps lost in some of the serialization/deserialization that 
occurs during `Transform` application!
   
   Here's what I'm seeing when I print out some context about a given 
`AppliedPTransform` [at this point in the 
runner](https://github.com/moradology/beam-pyspark-runner/blob/real_traversal_infrastructure/beam_pyspark_runner/pyspark_runner.py#L125)
 (so far, I've only run some visitors over the AST to collect some context that 
I use later in planning out execution):
   ```
'write test/Write/WriteImpl/WriteBundles': {'input_producer_labels': 
['write '
  
'test/Write/WriteImpl/WindowInto(WindowIntoFn)'],
'input_producers': 
[AppliedPTransform(write test/Write/WriteImpl/WindowInto(WindowIntoFn), 
WindowInto)],
'inputs': (,),
'outputs': 
dict_values([]),
'parent': 'write '
  
'test/Write/WriteImpl',
'side_inputs': 
(,),
'type': 'ParDo',
'xform_side_inputs': 
[]}}
   ```
   
   Note that I have an `_UnpickledSideInput`. This type does not include the 
`AsIter` and `AsSingleton` context that appears to be absolutely necessary to 
decide how results of a side-input should be passed to a consumer (whether the 
whole list or else just its head).
   
   What am I missing here? If I drop a debugger in beam's source for 
`core.ParDo`, I can see this information. It just appears to be lost later on. 
Example: ``


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Performance Regression or Improvement: cogbk_python_batch_load_test_2GB_of_100B_records_with_a_single_key:runtime [beam]

2024-03-19 Thread via GitHub


liferoad closed issue #30652: 
  Performance Regression or Improvement: 
cogbk_python_batch_load_test_2GB_of_100B_records_with_a_single_key:runtime

URL: https://github.com/apache/beam/issues/30652


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_2gb_of_100B_records:runtime [beam]

2024-03-19 Thread via GitHub


liferoad closed issue #30651: 
  Performance Regression or Improvement: 
gbk_python_batch_load_test_2gb_of_100B_records:runtime

URL: https://github.com/apache/beam/issues/30651


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_2gb_of_10B_records:runtime [beam]

2024-03-19 Thread via GitHub


liferoad closed issue #30650: 
  Performance Regression or Improvement: 
gbk_python_batch_load_test_2gb_of_10B_records:runtime

URL: https://github.com/apache/beam/issues/30650


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Performance Regression or Improvement: combine_python_batch_2gb_10_byte_records:runtime [beam]

2024-03-19 Thread via GitHub


liferoad closed issue #30649: 
  Performance Regression or Improvement: 
combine_python_batch_2gb_10_byte_records:runtime

URL: https://github.com/apache/beam/issues/30649


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Performance Regression or Improvement: gbk_python_batch_load_test_fanout_8_times_with_2GB_10byte_records_total:runtime [beam]

2024-03-19 Thread via GitHub


liferoad closed issue #30640: 
  Performance Regression or Improvement: 
gbk_python_batch_load_test_fanout_8_times_with_2GB_10byte_records_total:runtime

URL: https://github.com/apache/beam/issues/30640


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Task]: Check if feature store exists in VertexAIFeatureStoreEnrichmentHandler [beam]

2024-03-19 Thread via GitHub


riteshghorse closed issue #30541: [Task]: Check if feature store exists in 
VertexAIFeatureStoreEnrichmentHandler
URL: https://github.com/apache/beam/issues/30541


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Task]: Check if feature store exists in VertexAIFeatureStoreEnrichmentHandler [beam]

2024-03-19 Thread via GitHub


riteshghorse commented on issue #30541:
URL: https://github.com/apache/beam/issues/30541#issuecomment-2007266896

   Fixed by #30668 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update Python Dependencies from BRANCH weekly_update_python_dependencies_1710635460 [beam]

2024-03-19 Thread via GitHub


Abacn merged PR #30656:
URL: https://github.com/apache/beam/pull/30656


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/google/uuid from 1.5.0 to 1.6.0 in /sdks [beam]

2024-03-19 Thread via GitHub


lostluck commented on PR #30254:
URL: https://github.com/apache/beam/pull/30254#issuecomment-2007684387

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Check feature store existence at construction time [beam]

2024-03-19 Thread via GitHub


damccorm commented on PR #30668:
URL: https://github.com/apache/beam/pull/30668#issuecomment-2007222555

   Oh, looks like postcommits are failing though, could you take a look please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Check feature store existence at construction time [beam]

2024-03-19 Thread via GitHub


damccorm commented on PR #30668:
URL: https://github.com/apache/beam/pull/30668#issuecomment-2007225474

   3.11 failure looks like an unrelated pubsub test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Check feature store existence at construction time [beam]

2024-03-19 Thread via GitHub


riteshghorse merged PR #30668:
URL: https://github.com/apache/beam/pull/30668


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Check feature store existence at construction time [beam]

2024-03-19 Thread via GitHub


riteshghorse commented on PR #30668:
URL: https://github.com/apache/beam/pull/30668#issuecomment-2007260795

   Yeah look like it, It is passing on master - 
https://github.com/apache/beam/actions/runs/8338077957/job/22817805661.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]

2024-03-19 Thread via GitHub


scwhittle commented on code in PR #30439:
URL: https://github.com/apache/beam/pull/30439#discussion_r1530536071


##
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##
@@ -59,8 +72,8 @@ public BoundedQueueExecutor(
   @Override
   protected void beforeExecute(Thread t, Runnable r) {
 super.beforeExecute(t, r);
-synchronized (this) {
-  if (activeCount.getAndIncrement() >= maximumPoolSize - 1) {
+synchronized (BoundedQueueExecutor.this) {
+  if (activeCount++ >= maximumThreadCount - 1 && 
startTimeMaxActiveThreadsUsed == 0) {

Review Comment:
   nit: how about
   ++activeCount >= maximumThreadCount
   
   seems easier to read



##
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##
@@ -69,8 +82,8 @@ protected void beforeExecute(Thread t, Runnable r) {
   @Override
   protected void afterExecute(Runnable r, Throwable t) {
 super.afterExecute(r, t);
-synchronized (this) {
-  if (activeCount.getAndDecrement() == maximumPoolSize) {
+synchronized (BoundedQueueExecutor.this) {
+  if (activeCount-- <= maximumThreadCount && 
startTimeMaxActiveThreadsUsed > 0) {

Review Comment:
   ditto,
   --activeCount < maximumThreadCount 
   seems simpler



##
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java:
##
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.util;
+
+import static org.hamcrest.Matchers.greaterThan;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Unit tests for {@link 
org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */
+@RunWith(JUnit4.class)
+// TODO(https://github.com/apache/beam/issues/21230): Remove when new version 
of errorprone is
+// released (2.11.0)
+@SuppressWarnings("unused")
+public class BoundedQueueExecutorTest {
+  @Rule public transient Timeout globalTimeout = Timeout.seconds(300);
+  private static final long MAXIMUM_BYTES_OUTSTANDING = 1000;
+  private static final int DEFAULT_MAX_THREADS = 2;
+  private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60;
+
+  private BoundedQueueExecutor executor;
+
+  private Runnable createSleepProcessWorkFn(CountDownLatch start, 
CountDownLatch stop) {
+Runnable runnable =
+() -> {
+  start.countDown();
+  try {
+stop.await();
+  } catch (Exception e) {
+throw new RuntimeException(e);
+  }
+};
+return runnable;
+  }
+
+  @Before
+  public void setUp() {
+this.executor =
+new BoundedQueueExecutor(
+DEFAULT_MAX_THREADS,
+DEFAULT_THREAD_EXPIRATION_SEC,
+TimeUnit.SECONDS,
+DEFAULT_MAX_THREADS + 100,
+MAXIMUM_BYTES_OUTSTANDING,
+new ThreadFactoryBuilder()
+.setNameFormat("DataflowWorkUnits-%d")
+.setDaemon(true)
+.build());
+  }
+
+  @Test
+  public void testScheduleWork() throws Exception {
+CountDownLatch processStart1 = new CountDownLatch(1);
+CountDownLatch processStop1 = new CountDownLatch(1);
+CountDownLatch processStart2 = new CountDownLatch(1);
+CountDownLatch processStop2 = new CountDownLatch(1);
+CountDownLatch processStart3 = new CountDownLatch(1);
+CountDownLatch processStop3 = new CountDownLatch(1);
+Runnable m1 = 

Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]

2024-03-19 Thread via GitHub


liferoad commented on PR #30655:
URL: https://github.com/apache/beam/pull/30655#issuecomment-2007621658

   > Can we change the cron string to sth like '50 5 * * *' so the risk of 
interfering with other workflow is minimum
   > 
   > 
https://github.com/apache/beam/blob/50f33cd786dc63463688315a1c73b1cf4ef18807/.github/workflows/beam_LoadTests_Python_Combine_Dataflow_Streaming.yml#L20
   
   Good idea. Done. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Upgrade mypy version to 1.7.1 [beam]

2024-03-19 Thread via GitHub


riteshghorse closed pull request #29707: [Python] Upgrade mypy version to 1.7.1
URL: https://github.com/apache/beam/pull/29707


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30674:
URL: https://github.com/apache/beam/pull/30674#issuecomment-2007765764

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Minimize scope of expensive lock [beam]

2024-03-19 Thread via GitHub


damccorm opened a new pull request, #30679:
URL: https://github.com/apache/beam/pull/30679

   We've seen this section hang, eating up the lock when shutdown takes a long 
time. This should minimize the size of the critical section to a single copy of 
`self.last_access_times.items()` so that it still avoids #27501 (which the lock 
was introduced for) without hanging too long.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Deduplicate common environments. [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30681:
URL: https://github.com/apache/beam/pull/30681#issuecomment-2008172678

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Deduplicate common environments. [beam]

2024-03-19 Thread via GitHub


robertwb commented on PR #30681:
URL: https://github.com/apache/beam/pull/30681#issuecomment-2008171114

   R: @chamikaramj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]

2024-03-19 Thread via GitHub


MelodyShen commented on code in PR #30439:
URL: https://github.com/apache/beam/pull/30439#discussion_r1531166753


##
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java:
##
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.util;
+
+import static org.hamcrest.Matchers.greaterThan;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Unit tests for {@link 
org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */
+@RunWith(JUnit4.class)
+// TODO(https://github.com/apache/beam/issues/21230): Remove when new version 
of errorprone is
+// released (2.11.0)
+@SuppressWarnings("unused")
+public class BoundedQueueExecutorTest {
+  @Rule public transient Timeout globalTimeout = Timeout.seconds(300);
+  private static final long MAXIMUM_BYTES_OUTSTANDING = 1000;
+  private static final int DEFAULT_MAX_THREADS = 2;
+  private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60;
+
+  private BoundedQueueExecutor executor;
+
+  private Runnable createSleepProcessWorkFn(CountDownLatch start, 
CountDownLatch stop) {
+Runnable runnable =
+() -> {
+  start.countDown();
+  try {
+stop.await();
+  } catch (Exception e) {
+throw new RuntimeException(e);
+  }
+};
+return runnable;
+  }
+
+  @Before
+  public void setUp() {
+this.executor =
+new BoundedQueueExecutor(
+DEFAULT_MAX_THREADS,
+DEFAULT_THREAD_EXPIRATION_SEC,
+TimeUnit.SECONDS,
+DEFAULT_MAX_THREADS + 100,
+MAXIMUM_BYTES_OUTSTANDING,
+new ThreadFactoryBuilder()
+.setNameFormat("DataflowWorkUnits-%d")
+.setDaemon(true)
+.build());
+  }
+
+  @Test
+  public void testScheduleWork() throws Exception {
+CountDownLatch processStart1 = new CountDownLatch(1);
+CountDownLatch processStop1 = new CountDownLatch(1);
+CountDownLatch processStart2 = new CountDownLatch(1);
+CountDownLatch processStop2 = new CountDownLatch(1);
+CountDownLatch processStart3 = new CountDownLatch(1);
+CountDownLatch processStop3 = new CountDownLatch(1);
+Runnable m1 = createSleepProcessWorkFn(processStart1, processStop1);
+Runnable m2 = createSleepProcessWorkFn(processStart2, processStop2);
+Runnable m3 = createSleepProcessWorkFn(processStart3, processStop3);
+
+executor.execute(m1, 1);
+assertTrue(processStart1.await(1000, TimeUnit.MILLISECONDS));
+executor.execute(m2, 1);
+assertTrue(processStart2.await(1000, TimeUnit.MILLISECONDS));
+// m1 and m2 have started and all threads are occupied so m3 will be 
queued and not executed.
+executor.execute(m3, 1);
+assertFalse(processStart3.await(1000, TimeUnit.MILLISECONDS));
+
+// Stop m1 so there is an available thread for m3 to run.
+processStop1.countDown();
+assertTrue(processStart3.await(1000, TimeUnit.MILLISECONDS));
+// m3 started.
+processStop2.countDown();
+processStop3.countDown();
+executor.shutdown();
+  }
+
+  @Test
+  public void testOverrideMaximumThreadCount() throws Exception {
+CountDownLatch processStart1 = new CountDownLatch(1);
+CountDownLatch processStart2 = new CountDownLatch(1);
+CountDownLatch processStart3 = new CountDownLatch(1);
+CountDownLatch stop = new CountDownLatch(1);
+Runnable m1 = createSleepProcessWorkFn(processStart1, stop);
+Runnable m2 = createSleepProcessWorkFn(processStart2, stop);
+Runnable m3 = createSleepProcessWorkFn(processStart3, stop);
+
+// Initial 

Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]

2024-03-19 Thread via GitHub


kennknowles commented on PR #30545:
URL: https://github.com/apache/beam/pull/30545#issuecomment-2007801662

   The problem in the portable runner was that it was not registered as a 
"native" transform. I don't know the details, but this influences the fuser.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The PostCommit Python ValidatesRunner Spark job is flaky [beam]

2024-03-19 Thread via GitHub


Abacn commented on issue #30645:
URL: https://github.com/apache/beam/issues/30645#issuecomment-2007928104

   This is due to 
https://github.com/apache/beam/pull/30587#issuecomment-2004812901 CC: @robertwb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] replace clock.milliseconds with stopwatch [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30678:
URL: https://github.com/apache/beam/pull/30678#issuecomment-2007928098

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The PostCommit XVR Samza job is flaky [beam]

2024-03-19 Thread via GitHub


Abacn closed issue #30601: The PostCommit XVR Samza job is flaky
URL: https://github.com/apache/beam/issues/30601


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]

2024-03-19 Thread via GitHub


MelodyShen commented on code in PR #30439:
URL: https://github.com/apache/beam/pull/30439#discussion_r1531096133


##
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java:
##
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.util;
+
+import static org.hamcrest.Matchers.greaterThan;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Unit tests for {@link 
org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */
+@RunWith(JUnit4.class)
+// TODO(https://github.com/apache/beam/issues/21230): Remove when new version 
of errorprone is
+// released (2.11.0)
+@SuppressWarnings("unused")
+public class BoundedQueueExecutorTest {
+  @Rule public transient Timeout globalTimeout = Timeout.seconds(300);
+  private static final long MAXIMUM_BYTES_OUTSTANDING = 1000;
+  private static final int DEFAULT_MAX_THREADS = 2;
+  private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60;
+
+  private BoundedQueueExecutor executor;
+
+  private Runnable createSleepProcessWorkFn(CountDownLatch start, 
CountDownLatch stop) {
+Runnable runnable =
+() -> {
+  start.countDown();
+  try {
+stop.await();
+  } catch (Exception e) {
+throw new RuntimeException(e);
+  }
+};
+return runnable;
+  }
+
+  @Before
+  public void setUp() {
+this.executor =
+new BoundedQueueExecutor(
+DEFAULT_MAX_THREADS,
+DEFAULT_THREAD_EXPIRATION_SEC,
+TimeUnit.SECONDS,
+DEFAULT_MAX_THREADS + 100,
+MAXIMUM_BYTES_OUTSTANDING,
+new ThreadFactoryBuilder()
+.setNameFormat("DataflowWorkUnits-%d")
+.setDaemon(true)
+.build());
+  }
+
+  @Test
+  public void testScheduleWork() throws Exception {
+CountDownLatch processStart1 = new CountDownLatch(1);
+CountDownLatch processStop1 = new CountDownLatch(1);
+CountDownLatch processStart2 = new CountDownLatch(1);
+CountDownLatch processStop2 = new CountDownLatch(1);
+CountDownLatch processStart3 = new CountDownLatch(1);
+CountDownLatch processStop3 = new CountDownLatch(1);
+Runnable m1 = createSleepProcessWorkFn(processStart1, processStop1);
+Runnable m2 = createSleepProcessWorkFn(processStart2, processStop2);
+Runnable m3 = createSleepProcessWorkFn(processStart3, processStop3);
+
+executor.execute(m1, 1);
+assertTrue(processStart1.await(1000, TimeUnit.MILLISECONDS));
+executor.execute(m2, 1);
+assertTrue(processStart2.await(1000, TimeUnit.MILLISECONDS));
+// m1 and m2 have started and all threads are occupied so m3 will be 
queued and not executed.
+executor.execute(m3, 1);
+assertFalse(processStart3.await(1000, TimeUnit.MILLISECONDS));
+
+// Stop m1 so there is an available thread for m3 to run.
+processStop1.countDown();
+assertTrue(processStart3.await(1000, TimeUnit.MILLISECONDS));
+// m3 started.
+processStop2.countDown();
+processStop3.countDown();
+executor.shutdown();
+  }
+
+  @Test
+  public void testOverrideMaximumThreadCount() throws Exception {
+CountDownLatch processStart1 = new CountDownLatch(1);
+CountDownLatch processStart2 = new CountDownLatch(1);
+CountDownLatch processStart3 = new CountDownLatch(1);
+CountDownLatch stop = new CountDownLatch(1);
+Runnable m1 = createSleepProcessWorkFn(processStart1, stop);
+Runnable m2 = createSleepProcessWorkFn(processStart2, stop);
+Runnable m3 = createSleepProcessWorkFn(processStart3, stop);
+
+// Initial 

[PR] Deduplicate common environments. [beam]

2024-03-19 Thread via GitHub


robertwb opened a new pull request, #30681:
URL: https://github.com/apache/beam/pull/30681

   This can be especially useful for those pipelines with many cross-language 
transforms.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use autovalue's @Memoized in ExponentialBuckets [beam]

2024-03-19 Thread via GitHub


JayajP commented on PR #30676:
URL: https://github.com/apache/beam/pull/30676#issuecomment-2007940060

   R: @m-trieu


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use autovalue's @Memoized in ExponentialBuckets [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30676:
URL: https://github.com/apache/beam/pull/30676#issuecomment-2007942325

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add ExternalTransformProvider example [beam]

2024-03-19 Thread via GitHub


liferoad commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1530967292


##
examples/multi-language/README.md:
##
@@ -28,6 +28,8 @@ This project provides examples of Apache Beam
 * **python/javacount** - A Python pipeline that counts words using the Java 
`Count.perElement()` transform.
 * **python/javadatagenerator** - A Python pipeline that produces a set of 
strings generated from Java.
   This example demonstrates the 
`JavaExternalTransform` API.
+* **python/wordcount_external** - A Python pipeline that runs the Word Count 
workflow using three external Java
+transforms. This example demonstrates the simpler 
`ExternalTransformProvider` API.

Review Comment:
   Do we have any step-by-step guide somewhere about how to create a new 
JavaExternalTransform? If so, can we link it here? When I first looked at the 
java codes, I am a bit lost about what parts I need to create in order to use 
`ExternalTransformProvider`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Disable unsupported custom window type test on samza and spark. [beam]

2024-03-19 Thread via GitHub


robertwb opened a new pull request, #30680:
URL: https://github.com/apache/beam/pull/30680

   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Deduplicate common environments. [beam]

2024-03-19 Thread via GitHub


robertwb commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1531179623


##
sdks/python/apache_beam/runners/common.py:
##
@@ -1941,3 +1945,64 @@ def validate_transform(transform_id):
 
   for t in pipeline_proto.root_transform_ids:
 validate_transform(t)
+
+
+def merge_common_environments(pipeline_proto):

Review Comment:
   Ah, yes, it looks like it does. (That code didn't seem to be working, as I 
was definitely seeing environments that needed deduplication, but perhaps I 
should merge the two.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use autovalue's @Memoized in ExponentialBuckets [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30676:
URL: https://github.com/apache/beam/pull/30676#issuecomment-2007873357

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]

2024-03-19 Thread via GitHub


Abacn commented on PR #30674:
URL: https://github.com/apache/beam/pull/30674#issuecomment-2007874461

   R: @damccorm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]

2024-03-19 Thread via GitHub


kennknowles commented on PR #30545:
URL: https://github.com/apache/beam/pull/30545#issuecomment-2008089051

   OK this is now green. The test suites that are still running were green 
before and have not been touched. Please review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30674:
URL: https://github.com/apache/beam/pull/30674#issuecomment-2007876434

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Minimize scope of expensive lock [beam]

2024-03-19 Thread via GitHub


damccorm commented on PR #30679:
URL: https://github.com/apache/beam/pull/30679#issuecomment-2008095596

   R: @tvalentyn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]

2024-03-19 Thread via GitHub


kennknowles commented on code in PR #30545:
URL: https://github.com/apache/beam/pull/30545#discussion_r1531106246


##
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java:
##
@@ -917,6 +919,41 @@ private  void groupByKeyAndSortValuesHelper(
   }
 });
 
+registerTransformTranslator(
+RedistributeByKey.class,

Review Comment:
   @robertwb you might be the person to review DataflowRunner translation? For 
Reshuffle we don't have a translator but a more complex rewrites to a 
specialized GroupByKey. I opted to _not_ do that this time but translate more 
directly. I added ValidatesRunner tests for Redistribute that check parity with 
Reshuffle, at least in terms of that test suite.
   
   CC @scwhittle 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Deduplicate common environments. [beam]

2024-03-19 Thread via GitHub


chamikaramj commented on code in PR #30681:
URL: https://github.com/apache/beam/pull/30681#discussion_r1531155798


##
sdks/python/apache_beam/runners/common.py:
##
@@ -1941,3 +1945,64 @@ def validate_transform(transform_id):
 
   for t in pipeline_proto.root_transform_ids:
 validate_transform(t)
+
+
+def merge_common_environments(pipeline_proto):

Review Comment:
   Does this make the merge logic at the following location obsolete ?
   
   
https://github.com/apache/beam/blob/fb7ba65e2236f3dd871b6e492afc07249a4a5c49/sdks/python/apache_beam/pipeline.py#L964



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Dataflow worker container resolved to legacy runner label if not explicitly disable/enable runner v2 in 2.54.0+.dev [beam]

2024-03-19 Thread via GitHub


Abacn commented on issue #30634:
URL: https://github.com/apache/beam/issues/30634#issuecomment-2007838586

   This also affects 
https://github.com/apache/beam/blob/a3e5ac86eeade9fbef391a2c19d67825335938e6/it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/dataflow/AbstractPipelineLauncher.java#L261-L267
 
   
   the throughput metrics are not correctly reported in IOLoadTests for read 
pipelines 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The PostCommit Python ValidatesRunner Samza job is flaky [beam]

2024-03-19 Thread via GitHub


Abacn commented on issue #30657:
URL: https://github.com/apache/beam/issues/30657#issuecomment-2007926950

   This is due to 
https://github.com/apache/beam/pull/30587#issuecomment-2004812901 CC: @robertwb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Minimize scope of expensive lock [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30679:
URL: https://github.com/apache/beam/pull/30679#issuecomment-2008098106

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]

2024-03-19 Thread via GitHub


MelodyShen commented on code in PR #30439:
URL: https://github.com/apache/beam/pull/30439#discussion_r1531099531


##
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java:
##
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker.util;
+
+import static org.hamcrest.Matchers.greaterThan;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertThat;
+import static org.junit.Assert.assertTrue;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Unit tests for {@link 
org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */
+@RunWith(JUnit4.class)
+// TODO(https://github.com/apache/beam/issues/21230): Remove when new version 
of errorprone is
+// released (2.11.0)
+@SuppressWarnings("unused")
+public class BoundedQueueExecutorTest {
+  @Rule public transient Timeout globalTimeout = Timeout.seconds(300);
+  private static final long MAXIMUM_BYTES_OUTSTANDING = 1000;
+  private static final int DEFAULT_MAX_THREADS = 2;
+  private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60;
+
+  private BoundedQueueExecutor executor;
+
+  private Runnable createSleepProcessWorkFn(CountDownLatch start, 
CountDownLatch stop) {
+Runnable runnable =
+() -> {
+  start.countDown();
+  try {
+stop.await();
+  } catch (Exception e) {
+throw new RuntimeException(e);
+  }
+};
+return runnable;
+  }
+
+  @Before
+  public void setUp() {
+this.executor =
+new BoundedQueueExecutor(
+DEFAULT_MAX_THREADS,
+DEFAULT_THREAD_EXPIRATION_SEC,
+TimeUnit.SECONDS,
+DEFAULT_MAX_THREADS + 100,
+MAXIMUM_BYTES_OUTSTANDING,
+new ThreadFactoryBuilder()
+.setNameFormat("DataflowWorkUnits-%d")
+.setDaemon(true)
+.build());
+  }
+
+  @Test
+  public void testScheduleWork() throws Exception {
+CountDownLatch processStart1 = new CountDownLatch(1);
+CountDownLatch processStop1 = new CountDownLatch(1);
+CountDownLatch processStart2 = new CountDownLatch(1);
+CountDownLatch processStop2 = new CountDownLatch(1);
+CountDownLatch processStart3 = new CountDownLatch(1);
+CountDownLatch processStop3 = new CountDownLatch(1);
+Runnable m1 = createSleepProcessWorkFn(processStart1, processStop1);
+Runnable m2 = createSleepProcessWorkFn(processStart2, processStop2);
+Runnable m3 = createSleepProcessWorkFn(processStart3, processStop3);
+
+executor.execute(m1, 1);
+assertTrue(processStart1.await(1000, TimeUnit.MILLISECONDS));

Review Comment:
   Cool!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]

2024-03-19 Thread via GitHub


MelodyShen commented on code in PR #30439:
URL: https://github.com/apache/beam/pull/30439#discussion_r1531100277


##
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##
@@ -59,8 +72,8 @@ public BoundedQueueExecutor(
   @Override
   protected void beforeExecute(Thread t, Runnable r) {
 super.beforeExecute(t, r);
-synchronized (this) {
-  if (activeCount.getAndIncrement() >= maximumPoolSize - 1) {
+synchronized (BoundedQueueExecutor.this) {
+  if (activeCount++ >= maximumThreadCount - 1 && 
startTimeMaxActiveThreadsUsed == 0) {

Review Comment:
   Sure!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] replace clock.milliseconds with stopwatch [beam]

2024-03-19 Thread via GitHub


clmccart commented on PR #30678:
URL: https://github.com/apache/beam/pull/30678#issuecomment-2007781631

   cc: @tudorm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]

2024-03-19 Thread via GitHub


kennknowles commented on code in PR #30545:
URL: https://github.com/apache/beam/pull/30545#discussion_r1531103041


##
model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto:
##
@@ -813,6 +813,10 @@ message GroupIntoBatchesPayload {
   int64 max_buffering_duration_millis = 2;
 }
 
+message RedistributePayload {

Review Comment:
   @robertwb pinging you on this very thrilling addition to the protos



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update Python Dependencies from BRANCH weekly_update_python_dependencies_1708820977 [beam]

2024-03-19 Thread via GitHub


damondouglas closed pull request #30412: Update Python Dependencies from BRANCH 
weekly_update_python_dependencies_1708820977
URL: https://github.com/apache/beam/pull/30412


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add BigTableIO Stress test [beam]

2024-03-19 Thread via GitHub


akashorabek commented on PR #30630:
URL: https://github.com/apache/beam/pull/30630#issuecomment-2008682755

   Run Kotlin_Examples PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Seeking guidance on new runner implementation [beam]

2024-03-19 Thread via GitHub


cisaacstern commented on issue #30675:
URL: https://github.com/apache/beam/issues/30675#issuecomment-2008380280

    @moradology! Over in my WIP for DaskRunner SideInputs, I've got this:
   
   
https://github.com/apache/beam/pull/27618/files#diff-bfb5ae715e9067778f492058e8a02ff877d6e7584624908ddbdd316853e6befbR173-R182
   
   (Which is heavily modeled on the RayRunner.)
   
   I actually don't remember off the top of my head what part of that surfaces 
the `As*` typing, but the tests pass over there so I think it's being captured 
somehow! With a little more time I could set up my dev environment and 
introspect a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Disable unsupported custom window type test on samza and spark. [beam]

2024-03-19 Thread via GitHub


Abacn merged PR #30680:
URL: https://github.com/apache/beam/pull/30680


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The PostCommit Python ValidatesRunner Samza job is flaky [beam]

2024-03-19 Thread via GitHub


Abacn closed issue #30657: The PostCommit Python ValidatesRunner Samza job is 
flaky
URL: https://github.com/apache/beam/issues/30657


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add BigTableIO Stress test [beam]

2024-03-19 Thread via GitHub


akashorabek commented on code in PR #30630:
URL: https://github.com/apache/beam/pull/30630#discussion_r1531474427


##
it/google-cloud-platform/src/main/java/org/apache/beam/it/gcp/IOStressTestBase.java:
##
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.it.gcp;
+
+import com.google.cloud.Timestamp;
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.ParseException;
+import java.time.Duration;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+import org.apache.beam.it.common.PipelineLauncher;
+import org.apache.beam.sdk.testutils.NamedTestResult;
+import org.apache.beam.sdk.testutils.metrics.IOITMetrics;
+import org.apache.beam.sdk.testutils.publishing.InfluxDBSettings;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.joda.time.Instant;
+
+/** Base class for IO Stress tests. */
+public class IOStressTestBase extends IOLoadTestBase {
+  /**
+   * The load will initiate at 1x, progressively increase to 2x and 4x, then 
decrease to 2x and
+   * eventually return to 1x.
+   */
+  protected static final int[] DEFAULT_LOAD_INCREASE_ARRAY = {1, 2, 2, 4, 2, 
1};
+
+  protected static final int DEFAULT_ROWS_PER_SECOND = 1000;
+
+  /**
+   * Generates and returns a list of LoadPeriod instances representing periods 
of load increase
+   * based on the specified load increase array and total duration in minutes.
+   *
+   * @param minutesTotal The total duration in minutes for which the load 
periods are generated.
+   * @return A list of LoadPeriod instances defining periods of load increase.
+   */
+  protected List getLoadPeriods(int minutesTotal, int[] 
loadIncreaseArray) {
+
+List loadPeriods = new ArrayList<>();
+long periodDurationMillis =
+Duration.ofMinutes(minutesTotal / loadIncreaseArray.length).toMillis();
+long startTimeMillis = 0;
+
+for (int loadIncreaseMultiplier : loadIncreaseArray) {
+  long endTimeMillis = startTimeMillis + periodDurationMillis;
+  loadPeriods.add(new LoadPeriod(loadIncreaseMultiplier, startTimeMillis, 
endTimeMillis));
+
+  startTimeMillis = endTimeMillis;
+}
+return loadPeriods;
+  }
+
+  /**
+   * Represents a period of time with associated load increase properties for 
stress testing
+   * scenarios.
+   */
+  protected static class LoadPeriod implements Serializable {
+private final int loadIncreaseMultiplier;
+private final long periodStartMillis;
+private final long periodEndMillis;
+
+public LoadPeriod(int loadIncreaseMultiplier, long periodStartMillis, long 
periodEndMin) {
+  this.loadIncreaseMultiplier = loadIncreaseMultiplier;
+  this.periodStartMillis = periodStartMillis;
+  this.periodEndMillis = periodEndMin;
+}
+
+public int getLoadIncreaseMultiplier() {
+  return loadIncreaseMultiplier;
+}
+
+public long getPeriodStartMillis() {
+  return periodStartMillis;
+}
+
+public long getPeriodEndMillis() {
+  return periodEndMillis;
+}
+  }
+
+  /**
+   * Custom Apache Beam DoFn designed for use in stress testing scenarios. It 
introduces a dynamic
+   * load increase over time, multiplying the input elements based on the 
elapsed time since the
+   * start of processing. This class aims to simulate various load levels 
during stress testing.
+   */
+  protected static class MultiplierDoFn extends DoFn {
+private final int startMultiplier;
+private final long startTimesMillis;
+private final List loadPeriods;
+
+public MultiplierDoFn(int startMultiplier, List loadPeriods) {
+  this.startMultiplier = startMultiplier;
+  this.startTimesMillis = Instant.now().getMillis();
+  this.loadPeriods = loadPeriods;
+}
+
+@DoFn.ProcessElement
+public void processElement(
+@Element T element, OutputReceiver outputReceiver, @DoFn.Timestamp 
Instant timestamp) {
+
+  int multiplier = this.startMultiplier;
+  long elapsedTimeMillis = timestamp.getMillis() - startTimesMillis;
+
+  for (LoadPeriod loadPeriod : loadPeriods) {
+if (elapsedTimeMillis >= loadPeriod.getPeriodStartMillis()

[PR] Bump github.com/docker/docker from 25.0.3+incompatible to 25.0.5+incompatible in /sdks [beam]

2024-03-19 Thread via GitHub


dependabot[bot] opened a new pull request, #30684:
URL: https://github.com/apache/beam/pull/30684

   Bumps [github.com/docker/docker](https://github.com/docker/docker) from 
25.0.3+incompatible to 25.0.5+incompatible.
   
   Release notes
   Sourced from https://github.com/docker/docker/releases;>github.com/docker/docker's 
releases.
   
   25.0.5
   For a full list of pull requests and changes in this release, refer to 
the relevant GitHub milestones:
   
   https://github.com/docker/cli/issues?q=is%3Aclosed+milestone%3A25.0.5;>docker/cli,
 25.0.5 milestone
   https://github.com/moby/moby/issues?q=is%3Aclosed+milestone%3A25.0.5;>moby/moby,
 25.0.5 milestone
   Deprecated and removed features, see https://github.com/docker/cli/blob/v25.0.5/docs/deprecated.md;>Deprecated 
Features.
   Changes to the Engine API, see https://github.com/moby/moby/blob/v25.0.5/docs/api/version-history.md;>API
 version history.
   
   Security
   This release contains a security fix for https://github.com/moby/moby/security/advisories/GHSA-mq39-4gv4-mvpx;>CVE-2024-29018,
 a potential data exfiltration from 'internal' networks via authoritative DNS 
servers.
   Bug fixes and enhancements
   
   https://github.com/moby/moby/security/advisories/GHSA-mq39-4gv4-mvpx;>CVE-2024-29018:
 Do not forward requests to external DNS servers for a container that is only 
connected to an 'internal' network. Previously, requests were forwarded if the 
host's DNS server was running on a loopback address, like systemd's 127.0.0.53. 
https://redirect.github.com/moby/moby/pull/47589;>moby/moby#47589
   plugin: fix mounting /etc/hosts when running in UserNS. https://redirect.github.com/moby/moby/pull/47588;>moby/moby#47588
   rootless: fix open /etc/docker/plugins: permission denied. 
https://redirect.github.com/moby/moby/pull/47587;>moby/moby#47587
   Fix multiple parallel docker build runs leaking disk space. 
https://redirect.github.com/moby/moby/pull/47527;>moby/moby#47527
   
   v25.0.4
   For a full list of pull requests and changes in this release, refer to 
the relevant GitHub milestones:
   
   https://github.com/docker/cli/issues?q=is%3Aclosed+milestone%3A25.0.4;>docker/cli,
 25.0.4 milestone
   https://github.com/moby/moby/issues?q=is%3Aclosed+milestone%3A25.0.4;>moby/moby,
 25.0.4 milestone
   Deprecated and removed features, see https://github.com/docker/cli/blob/v25.0.4/docs/deprecated.md;>Deprecated 
Features.
   Changes to the Engine API, see https://github.com/moby/moby/blob/v25.0.4/docs/api/version-history.md;>API
 version history.
   
   Bug fixes and enhancements
   
   Restore DNS names for containers in the default nat network 
on Windows. https://redirect.github.com/moby/moby/pull/47490;>moby/moby#47490
   Fix docker start failing when used with 
--checkpoint https://redirect.github.com/moby/moby/pull/47466;>moby/moby#47466
   Don't enforce new validation rules for existing swarm networks https://redirect.github.com/moby/moby/pull/47482;>moby/moby#47482
   Restore IP connectivity between the host and containers on an internal 
bridge network. https://redirect.github.com/moby/moby/pull/47481;>moby/moby#47481
   Fix a regression introduced in v25.0 that prevented the classic builder 
from ADDing a tar archive with xattrs created on a non-Linux OS https://redirect.github.com/moby/moby/pull/47483;>moby/moby#47483
   containerd image store: Fix image pull not emitting Pulling fs 
layer status https://redirect.github.com/moby/moby/pull/47484;>moby/moby#47484
   
   API
   
   To preserve backwards compatibility, make read-only mounts not recursive 
by default when using older clients (API version  v1.44). https://redirect.github.com/moby/moby/pull/47393;>moby/moby#47393
   GET /images/{id}/json omits the Created field 
(previously it was 0001-01-01T00:00:00Z) if the 
Created field is missing from the image config. https://redirect.github.com/moby/moby/pull/47451;>moby/moby#47451
   Populate a missing Created field in GET 
/images/{id}/json with 0001-01-01T00:00:00Z for API version 
= 1.43. https://redirect.github.com/moby/moby/pull/47387;>moby/moby#47387
   Fix a regression that caused API socket connection failures to report an 
API version negotiation failure instead. https://redirect.github.com/moby/moby/pull/47470;>moby/moby#47470
   Preserve supplied endpoint configuration in a container-create API 
request, when a container-wide MAC address is specified, but 
NetworkMode name-or-id is not the same as the name-or-id used in 
NetworkSettings.Networks. https://redirect.github.com/moby/moby/pull/47510;>moby/moby#47510
   
   Packaging updates
   
   Upgrade Go runtime to https://go.dev/doc/devel/release#go1.21.8;>1.21.8. https://redirect.github.com/moby/moby/pull/47503;>moby/moby#47503
   Upgrade RootlessKit to https://github.com/rootless-containers/rootlesskit/releases/tag/v2.0.2;>v2.0.2.
  https://redirect.github.com/moby/moby/pull/47508;>moby/moby#47508
   Upgrade Compose to 

[PR] Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 in /sdks [beam]

2024-03-19 Thread via GitHub


dependabot[bot] opened a new pull request, #30685:
URL: https://github.com/apache/beam/pull/30685

   Bumps [golang.org/x/oauth2](https://github.com/golang/oauth2) from 0.17.0 to 
0.18.0.
   
   Commits
   
   https://github.com/golang/oauth2/commit/85231f99d65eedc833c8fccfec7fd7d8303c0d3e;>85231f9
 go.mod: update golang.org/x dependencies
   https://github.com/golang/oauth2/commit/34a7afaa8571b555a177d9bf0360276cbb94f630;>34a7afa
 google/externalaccount: add Config.UniverseDomain
   https://github.com/golang/oauth2/commit/95bec9538152e03de0cfbaf64cd3af163b8cef30;>95bec95
 google/externalaccount: moves externalaccount package out of internal and 
exp...
   See full diff in https://github.com/golang/oauth2/compare/v0.17.0...v0.18.0;>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/oauth2=go_modules=0.17.0=0.18.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 in /sdks [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30685:
URL: https://github.com/apache/beam/pull/30685#issuecomment-2008666471

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @riteshghorse for label go.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/docker/docker from 25.0.3+incompatible to 25.0.5+incompatible in /sdks [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30684:
URL: https://github.com/apache/beam/pull/30684#issuecomment-2008666508

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @jrmccluskey for label go.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement ordered list state for FnApi. [beam]

2024-03-19 Thread via GitHub


acrites commented on code in PR #30317:
URL: https://github.com/apache/beam/pull/30317#discussion_r1531296538


##
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##


Review Comment:
   Or are we going to piggy-back on multimap for this? (If so we should delete 
the TODO.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] The PostCommit Python ValidatesRunner Samza job is flaky [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on issue #30657:
URL: https://github.com/apache/beam/issues/30657#issuecomment-2008452865

   Reopening since the workflow is still flaky


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] The PreCommit Java job is flaky [beam]

2024-03-19 Thread via GitHub


github-actions[bot] opened a new issue, #30683:
URL: https://github.com/apache/beam/issues/30683

   The PreCommit Java is failing over 50% of the time 
   Please visit 
https://github.com/apache/beam/actions/workflows/beam_PreCommit_Java.yml?query=is%3Afailure+branch%3Amaster
 to see the logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/docker/docker from 25.0.3+incompatible to 25.0.5+incompatible in /sdks [beam]

2024-03-19 Thread via GitHub


codecov[bot] commented on PR #30684:
URL: https://github.com/apache/beam/pull/30684#issuecomment-2008651036

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30684?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 38.53%. Comparing base 
[(`389e106`)](https://app.codecov.io/gh/apache/beam/commit/389e1067c9d1f9bcc99b338ffb46e4923f692cae?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`38a0341`)](https://app.codecov.io/gh/apache/beam/pull/30684?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   
   Additional details and impacted files
   
   
   ```diff
   @@   Coverage Diff   @@
   ##   master   #30684   +/-   ##
   ===
 Coverage   38.53%   38.53%   
   ===
 Files 698  698   
 Lines  102360   102360   
   ===
 Hits3944439444   
 Misses  6128461284   
 Partials 1632 1632   
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30684/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[go](https://app.codecov.io/gh/apache/beam/pull/30684/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `54.33% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30684?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump golang.org/x/oauth2 from 0.17.0 to 0.18.0 in /sdks [beam]

2024-03-19 Thread via GitHub


codecov[bot] commented on PR #30685:
URL: https://github.com/apache/beam/pull/30685#issuecomment-2008651162

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30685?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 38.53%. Comparing base 
[(`389e106`)](https://app.codecov.io/gh/apache/beam/commit/389e1067c9d1f9bcc99b338ffb46e4923f692cae?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`d9ff2ea`)](https://app.codecov.io/gh/apache/beam/pull/30685?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   
   Additional details and impacted files
   
   
   ```diff
   @@Coverage Diff @@
   ##   master   #30685  +/-   ##
   ==
   - Coverage   38.53%   38.53%   -0.01% 
   ==
 Files 698  698  
 Lines  102360   102360  
   ==
   - Hits3944439443   -1 
   - Misses  6128461285   +1 
 Partials 1632 1632  
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30685/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[go](https://app.codecov.io/gh/apache/beam/pull/30685/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `54.33% <ø> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30685?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add BigTableIO Stress test [beam]

2024-03-19 Thread via GitHub


akashorabek commented on PR #30630:
URL: https://github.com/apache/beam/pull/30630#issuecomment-2008651373

   Run Java_Examples_Dataflow PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement ordered list state for FnApi. [beam]

2024-03-19 Thread via GitHub


acrites commented on code in PR #30317:
URL: https://github.com/apache/beam/pull/30317#discussion_r1531294554


##
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##


Review Comment:
   Do we also need to add something here: 
https://github.com/apache/beam/blob/fb7ba65e2236f3dd871b6e492afc07249a4a5c49/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L478



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [Python] Add redis client to python dependencies [beam]

2024-03-19 Thread via GitHub


riteshghorse opened a new pull request, #30677:
URL: https://github.com/apache/beam/pull/30677

   With the addition of redis cache support to the Enrichment transform, it 
will be good to package redis client dependency with apache-beam package. 
Otherwise user will have to install it separately every time.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add ExternalTransformProvider example [beam]

2024-03-19 Thread via GitHub


ahmedabu98 commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1530961429


##
examples/multi-language/python/wordcount_external.py:
##
@@ -0,0 +1,102 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import logging
+
+import apache_beam as beam
+from apache_beam.io import ReadFromText
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.transforms.external_transform_provider import 
ExternalTransformProvider
+from apache_beam.typehints.row_type import RowTypeConstraint
+"""A Python multi-language pipeline that counts words.
+
+This pipeline reads an input text file then extracts and counts the words 
using Java SDK SchemaTransforms provided in
+`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. 
Wrappers for these transforms are dynamically
+provided in Python via the `ExternalTransformProvider` API.
+
+Example commands for executing this program:
+
+DirectRunner:
+$ python wordcount_external.py --runner DirectRunner --input  
--output  --expansion_service_port 
+
+DataflowRunner:
+$ python wordcount_external.py \
+  --runner DataflowRunner \
+  --temp_location $TEMP_LOCATION \
+  --project $GCP_PROJECT \
+  --region $GCP_REGION \
+  --job_name $JOB_NAME \
+  --num_workers $NUM_WORKERS \
+  --input "gs://dataflow-samples/shakespeare/kinglear.txt" \
+  --output "gs://$GCS_BUCKET/wordcount_external/output" \
+  --expansion_service_port 

Review Comment:
   There's a common section in the 
[README.md](https://github.com/apache/beam/blob/master/examples/multi-language/README.md#instructions-for-running-the-pipelines)
 file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Disable unsupported custom window type test on samza and spark. [beam]

2024-03-19 Thread via GitHub


robertwb commented on PR #30680:
URL: https://github.com/apache/beam/pull/30680#issuecomment-2007996651

   R: @Abacn


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Disable unsupported custom window type test on samza and spark. [beam]

2024-03-19 Thread via GitHub


github-actions[bot] commented on PR #30680:
URL: https://github.com/apache/beam/pull/30680#issuecomment-2007999326

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Revert "Disable remote gradle cache until it is cleaned (#30584)" [beam]

2024-03-19 Thread via GitHub


damccorm merged PR #30674:
URL: https://github.com/apache/beam/pull/30674


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >