Re: [PR] [YAML] Adds several tests exercising the cross-language capabilities. [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30880:
URL: https://github.com/apache/beam/pull/30880#issuecomment-2040852121

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @jrmccluskey for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [YAML] Adds several tests exercising the cross-language capabilities. [beam]

2024-04-05 Thread via GitHub


codecov-commenter commented on PR #30880:
URL: https://github.com/apache/beam/pull/30880#issuecomment-2040848762

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30880?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   Attention: Patch coverage is `25.0%` with `3 lines` in your changes are 
missing coverage. Please review.
   > Project coverage is 70.96%. Comparing base 
[(`4452a6c`)](https://app.codecov.io/gh/apache/beam/commit/4452a6c8d9758c9cd3ea39d37aaf149927c633ce?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`7ed6b6d`)](https://app.codecov.io/gh/apache/beam/pull/30880?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 1 commits behind head on master.
   
   | 
[Files](https://app.codecov.io/gh/apache/beam/pull/30880?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[sdks/python/apache\_beam/yaml/integration\_tests.py](https://app.codecov.io/gh/apache/beam/pull/30880?src=pr=tree=sdks%2Fpython%2Fapache_beam%2Fyaml%2Fintegration_tests.py_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0veWFtbC9pbnRlZ3JhdGlvbl90ZXN0cy5weQ==)
 | 0.00% | [3 Missing :warning: 
](https://app.codecov.io/gh/apache/beam/pull/30880?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #30880  +/-   ##
   
   - Coverage 71.05%   70.96%   -0.10% 
 Complexity 4470 4470  
   
 Files  1257 1257  
 Lines141373   140938 -435 
 Branches   4307 4307  
   
   - Hits 100454   100017 -437 
   - Misses3744037442   +2 
 Partials   3479 3479  
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30880/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[python](https://app.codecov.io/gh/apache/beam/pull/30880/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `81.71% <25.00%> (-0.13%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30880?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1554437196


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+/**
+ * Top-level {@link org.apache.beam.sdk.transforms.PTransform}s that build and 
instantiate turnkey
+ * transforms.
+ *
+ * Available transforms
+ *
+ * This API currently supports two operations: {@link Read} and {@link 
Write}. Each one
+ * enumerates the available transforms in a {@code TRANSFORMS} map.
+ *
+ * Building a Managed turnkey transform
+ *
+ * Turnkey transforms are represented as {@link SchemaTransform}s, which 
means each one has a
+ * defined configuration. A given transform can be built with a {@code 
Map} that
+ * specifies arguments using like so:
+ *
+ * {@code
+ * PCollectionRowTuple output = Managed.read(ICEBERG)
+ *  .withConfig(ImmutableMap..builder()
+ *  .put("foo", "abc")
+ *  .put("bar", 123)
+ *  .build());
+ * }
+ *
+ * Instead of specifying configuration arguments directly in the code, one 
can provide the
+ * location to a YAML file that contains this information. Say we have the 
following YAML file:
+ *
+ * {@code
+ * foo: "abc"
+ * bar: 123
+ * }
+ *
+ * The file's path can be passed in to the Managed API like so:
+ *
+ * {@code
+ * PCollectionRowTuple output = Managed.write(ICEBERG)
+ *  .withConfigUrl();
+ * }
+ */
+public class Managed {
+
+  // TODO: Dynamically generate a list of supported transforms
+  public static final String ICEBERG = "iceberg";
+
+  /**
+   * Instantiates a {@link Managed.Read} transform for the specified source. 
The supported managed
+   * sources are:
+   *
+   * 
+   *   {@link Managed#ICEBERG} : Read from Apache Iceberg
+   * 
+   */
+  public static Read read(String source) {
+
+return new AutoValue_Managed_Read.Builder()
+.setSource(
+Preconditions.checkNotNull(
+Read.TRANSFORMS.get(source.toLowerCase()),
+"An unsupported source was specified: '%s'. Please specify one 
of the following sources: %s",
+source,
+Read.TRANSFORMS.keySet()))
+.setSupportedIdentifiers(new ArrayList<>(Read.TRANSFORMS.values()))
+.build();
+  }
+
+  @AutoValue
+  public abstract static class Read extends SchemaTransform {
+public static final Map TRANSFORMS =
+ImmutableMap.builder()
+.put(ICEBERG, 
"beam:schematransform:org.apache.beam:iceberg_read:v1")
+.build();
+
+abstract String getSource();
+
+abstract @Nullable String getConfig();
+
+abstract @Nullable String getConfigUrl();
+
+abstract List getSupportedIdentifiers();
+
+abstract Builder toBuilder();
+
+@AutoValue.Builder
+abstract static class Builder {
+  abstract Builder setSource(String sourceIdentifier);
+
+  abstract Builder setConfig(@Nullable String config);
+
+  abstract Builder setConfigUrl(@Nullable String configUrl);
+
+  abstract Builder setSupportedIdentifiers(List 
supportedIdentifiers);

Review Comment:
   It feels a bit odd for this to be part of the configuration that one can set 
externally, even just for testing.



##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license 

[PR] [YAML] Adds several tests exercising the cross-language capabilities. [beam]

2024-04-05 Thread via GitHub


robertwb opened a new pull request, #30880:
URL: https://github.com/apache/beam/pull/30880

   These (or at least their java counterparts) were not tested anywhere before. 
   
   Also fixes some minor issues discovered along the way.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add identity function as default to FlatMap [beam]

2024-04-05 Thread via GitHub


tvalentyn merged PR #30744:
URL: https://github.com/apache/beam/pull/30744


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add identity function as default to FlatMap [beam]

2024-04-05 Thread via GitHub


tvalentyn commented on PR #30744:
URL: https://github.com/apache/beam/pull/30744#issuecomment-2040771175

   hmm not sure if trigger phrases work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add identity function as default to FlatMap [beam]

2024-04-05 Thread via GitHub


tvalentyn commented on PR #30744:
URL: https://github.com/apache/beam/pull/30744#issuecomment-2040770976

   Run Python_Coverage PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' and `pip check` failures [beam]

2024-04-05 Thread via GitHub


tvalentyn closed issue #30852: [Failing Test]: beam_PreCommit_Python_Coverage 
suite fails with ModuleNotFoundError: No module named 
'pydantic._hypothesis_plugin' and `pip check` failures
URL: https://github.com/apache/beam/issues/30852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adds a bound on pydantic to exclude incompatible versions in the compat test. [beam]

2024-04-05 Thread via GitHub


tvalentyn merged PR #30863:
URL: https://github.com/apache/beam/pull/30863


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-05 Thread via GitHub


robertwb merged PR #30864:
URL: https://github.com/apache/beam/pull/30864


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Add code change guide contributor-doc [beam]

2024-04-05 Thread via GitHub


Abacn opened a new pull request, #30879:
URL: https://github.com/apache/beam/pull/30879

   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


chamikaramj commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1554366451


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+
+  // TODO: Dynamically generate a list of supported transforms
+  public static final String ICEBERG = "iceberg";
+
+  public static Read read(String source) {
+
+return new AutoValue_Managed_Read.Builder()
+.setSource(
+Preconditions.checkNotNull(
+Read.TRANSFORMS.get(source.toLowerCase()),
+"An unsupported source was specified: '%s'. Please specify one 
of the following sources: %s",
+source,
+Read.TRANSFORMS.keySet()))
+.setSupportedIdentifiers(new ArrayList<>(Read.TRANSFORMS.values()))
+.build();
+  }
+
+  @AutoValue
+  public abstract static class Read extends SchemaTransform {
+public static final Map TRANSFORMS =
+ImmutableMap.builder()
+.put(ICEBERG, 
"beam:schematransform:org.apache.beam:iceberg_read:v1")
+.build();
+
+abstract String getSource();
+
+abstract @Nullable String getConfig();
+
+abstract @Nullable String getConfigUrl();
+
+abstract List getSupportedIdentifiers();
+
+abstract Builder toBuilder();
+
+@AutoValue.Builder
+abstract static class Builder {
+  abstract Builder setSource(String sourceIdentifier);
+
+  abstract Builder setConfig(@Nullable String config);

Review Comment:
   Ack. It's fine since this is not in the public API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30877:
URL: https://github.com/apache/beam/pull/30877#issuecomment-2040689922

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


ahmedabu98 commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1554343771


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+
+  // TODO: Dynamically generate a list of supported transforms
+  public static final String ICEBERG = "iceberg";
+
+  public static Read read(String source) {
+
+return new AutoValue_Managed_Read.Builder()
+.setSource(
+Preconditions.checkNotNull(
+Read.TRANSFORMS.get(source.toLowerCase()),
+"An unsupported source was specified: '%s'. Please specify one 
of the following sources: %s",
+source,
+Read.TRANSFORMS.keySet()))
+.setSupportedIdentifiers(new ArrayList<>(Read.TRANSFORMS.values()))
+.build();
+  }
+
+  @AutoValue
+  public abstract static class Read extends SchemaTransform {
+public static final Map TRANSFORMS =
+ImmutableMap.builder()
+.put(ICEBERG, 
"beam:schematransform:org.apache.beam:iceberg_read:v1")
+.build();
+
+abstract String getSource();
+
+abstract @Nullable String getConfig();
+
+abstract @Nullable String getConfigUrl();
+
+abstract List getSupportedIdentifiers();
+
+abstract Builder toBuilder();
+
+@AutoValue.Builder
+abstract static class Builder {
+  abstract Builder setSource(String sourceIdentifier);
+
+  abstract Builder setConfig(@Nullable String config);

Review Comment:
   This is included in [these 
lines](https://github.com/apache/beam/pull/30808/files#diff-666d799f826fb29116181c65747fcff142c9b2cef5cad804005b3f216ad46293R164-R168).
 There is also a test case for this 
[here](https://github.com/apache/beam/pull/30808/files#diff-5b10b7b441cda0a0625c14f543fae83ab78798d28a64fa121f1d1c532d331076R98).
 
   
   We use FileSystems, so hopefully this will support more than 
LocalFileSystem? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


ahmedabu98 commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1554339917


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+  public static final String ICEBERG = "iceberg";
+
+  public static Read read(String source) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-05 Thread via GitHub


Polber commented on code in PR #30864:
URL: https://github.com/apache/beam/pull/30864#discussion_r1554320826


##
sdks/python/apache_beam/runners/common.py:
##
@@ -1981,28 +1981,27 @@ def env_key(env):
 base_env_key(e)
 for e in environments.expand_anyof_environments(env)))
 
-  cannonical_enviornments = collections.defaultdict(list)
+  canonical_environments = collections.defaultdict(list)
   for env_id, env in pipeline_proto.components.environments.items():
-cannonical_enviornments[env_key(env)].append(env_id)
+canonical_environments[env_key(env)].append(env_id)
 
-  if len(cannonical_enviornments) == len(
-  pipeline_proto.components.environments):
+  if len(canonical_environments) == 
len(pipeline_proto.components.environments):
 # All environments are already sufficiently distinct.
 return pipeline_proto
 
   environment_remappings = {
   e: es[0]
-  for es in cannonical_enviornments.values() for e in es
+  for es in canonical_environments.values() for e in es
   }
 
   if not inplace:
 pipeline_proto = copy.copy(pipeline_proto)
 
   for t in pipeline_proto.components.transforms.values():

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adds a bound on pydantic to exclude incompatible versions in the compat test. [beam]

2024-04-05 Thread via GitHub


codecov-commenter commented on PR #30863:
URL: https://github.com/apache/beam/pull/30863#issuecomment-2040650103

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30863?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 71.53%. Comparing base 
[(`0d41168`)](https://app.codecov.io/gh/apache/beam/commit/0d41168a0963869df037e468f80f5ab8466a99ab?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`8145503`)](https://app.codecov.io/gh/apache/beam/pull/30863?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 65 commits behind head on master.
   
   
   Additional details and impacted files
   
   
   ```diff
   @@  Coverage Diff  @@
   ## master   #30863   +/-   ##
   =
   + Coverage 56.86%   71.53%   +14.67% 
 Complexity 1485 1485   
   =
 Files   501  905  +404 
 Lines 46219   113065+66846 
 Branches   1076 1076   
   =
   + Hits  2628380884+54601 
   - Misses1791830163+12245 
 Partials   2018 2018   
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30863/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[python](https://app.codecov.io/gh/apache/beam/pull/30863/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `81.68% <ø> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30863?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: GRCPIO 1.62.1 deadlocks in Beam SDK worker with Dataflow [beam]

2024-04-05 Thread via GitHub


tvalentyn commented on issue #30867:
URL: https://github.com/apache/beam/issues/30867#issuecomment-2040640597

   Ok. Thank you very much for reporting the issue, please let us know if you 
have more information, that might also help grpcio maintainers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump GCP-BOM to 26.36.0 [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30868:
URL: https://github.com/apache/beam/pull/30868#issuecomment-2040627858

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @m-trieu for label java.
   R: @damccorm for label build.
   R: @shunping for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump GCP-BOM to 26.36.0 [beam]

2024-04-05 Thread via GitHub


Abacn commented on PR #30868:
URL: https://github.com/apache/beam/pull/30868#issuecomment-2040626634

   assign set of reviewers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: GRCPIO 1.62.1 deadlocks in Beam SDK worker with Dataflow [beam]

2024-04-05 Thread via GitHub


DerRidda commented on issue #30867:
URL: https://github.com/apache/beam/issues/30867#issuecomment-2040607446

   > Thanks.
   > 
   > ```
   > (py38) :python$ docker run --rm -it --entrypoint=/bin/sh 
apache/beam_python3.10_sdk:2.53.0  -c "pip list | grep grpcio" 
   > grpcio  1.59.3
   > grpcio-status   1.59.3
   > (py38) :python$ docker run --rm -it --entrypoint=/bin/sh 
apache/beam_python3.10_sdk:2.54.0  -c "pip list | grep grpcio" 
   > grpcio  1.60.0
   > grpcio-status   1.60.0
   > (py38) :python$ docker run --rm -it --entrypoint=/bin/sh 
apache/beam_python3.10_sdk:2.55.0  -c "pip list | grep grpcio" 
   > grpcio  1.62.0
   > grpcio-status   1.62.0
   > (py38) :python$ docker run --rm -it --entrypoint=/bin/sh 
apache/beam_python3.10_sdk:2.55.1  -c "pip list | grep grpcio" 
   > grpcio  1.62.0
   > grpcio-status   1.62.0
   > ```
   > 
   > > I could determine that at least version 1.60.0 doesn't exhibit this 
issue.
   > 
   > Double checking, is this still the case?
   
   Actually still happens on 1.60.0 for me. It just looked fixed for a brief 
moment.
   The known good version of my pipeline probably uses 1.59.3 as I changed the 
SDK container image to be fully custom as well in the update so even though the 
project local deps were showing 1.60.0 it never got installed into the image of 
the known good service. I will investigate more on Monday.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-05 Thread via GitHub


xianhualiu commented on PR #30877:
URL: https://github.com/apache/beam/pull/30877#issuecomment-2040600314

   github issue created https://github.com/apache/beam/issues/30870. 
   Code done and PR created for review: 
https://github.com/apache/beam/pull/30877 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-05 Thread via GitHub


xianhualiu opened a new pull request, #30877:
URL: https://github.com/apache/beam/pull/30877

   addresses #30870. The changes in this PR make the consumer polling timeout 
configurable for KafkaIO.Read with following new command:
   
   KafkaIO.read().withConsumerPollingTimeout(Duration duration)
   
   The duration must be greater than zero. If not specified, the default will 
be 1 second.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add gradle target and github workflow for cross-langauge yaml tests. [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30874:
URL: https://github.com/apache/beam/pull/30874#discussion_r1554272442


##
.github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml:
##
@@ -0,0 +1,101 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: PostCommit Python Xlang Gcp Direct
+
+on:
+  pull_request_target:
+paths: ['release/trigger_all_tests.json', 'model/**', 'sdks/python/**']
+  issue_comment:
+types: [created]
+  push:
+tags: ['v*']
+branches: ['master', 'release-*']
+paths: [ 
"model/**","sdks/python/**","release/**",".github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml"]

Review Comment:
   Yeah. It's hard to know where to draw the line. I've added some candidates, 
and this should get run on postcommit/release as well. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add gradle target and github workflow for cross-langauge yaml tests. [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30874:
URL: https://github.com/apache/beam/pull/30874#discussion_r1554268928


##
.github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml:
##
@@ -0,0 +1,101 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: PostCommit Python Xlang Gcp Direct
+
+on:
+  pull_request_target:
+paths: ['release/trigger_all_tests.json', 'model/**', 'sdks/python/**']
+  issue_comment:
+types: [created]
+  push:
+tags: ['v*']
+branches: ['master', 'release-*']
+paths: [ 
"model/**","sdks/python/**","release/**",".github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml"]
+  schedule:
+- cron: '30 5/6 * * *'
+  workflow_dispatch:
+
+#Setting explicit permissions for the action to avoid the default permissions 
which are `write-all` in case of pull_request_target event
+permissions:
+  actions: write
+  pull-requests: write
+  checks: write
+  contents: read
+  deployments: read
+  id-token: none
+  issues: write
+  discussions: read
+  packages: read
+  pages: read
+  repository-projects: read
+  security-events: read
+  statuses: read
+
+# This allows a subsequently queued workflow run to interrupt previous runs
+concurrency:
+  group: '${{ github.workflow }} @ ${{ github.event.issue.number || github.sha 
|| github.head_ref || github.ref }}-${{ github.event.schedule || 
github.event.comment.id || github.event.sender.login }}'
+  cancel-in-progress: true
+
+env:
+  GRADLE_ENTERPRISE_ACCESS_KEY: ${{ secrets.GE_ACCESS_TOKEN }}
+  GRADLE_ENTERPRISE_CACHE_USERNAME: ${{ secrets.GE_CACHE_USERNAME }}
+  GRADLE_ENTERPRISE_CACHE_PASSWORD: ${{ secrets.GE_CACHE_PASSWORD }}
+
+jobs:
+  beam_PreCommit_Yaml_Xlang_Direct:
+if: |
+  github.event_name == 'workflow_dispatch' ||
+  github.event_name == 'pull_request_target' ||
+  (github.event_name == 'schedule' && github.repository == 'apache/beam') 
||
+  github.event.comment.body == 'Run Yaml_Xlang_Direct PreCommit'
+runs-on: [self-hosted, ubuntu-20.04, main]
+timeout-minutes: 100
+name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
+strategy:
+  matrix:
+job_name: ["beam_PreCommit_Yaml_Xlang_Direct"]
+job_phrase: ["Run Yaml_Xlang_Direct"]
+steps:
+  - uses: actions/checkout@v4
+  - name: Setup repository
+uses: ./.github/actions/setup-action
+with:
+  comment_phrase: ${{ matrix.job_phrase }}
+  github_token: ${{ secrets.GITHUB_TOKEN }}
+  github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
+  - name: Setup environment
+uses: ./.github/actions/setup-environment-action
+with:
+  python-version: |

Review Comment:
   It looks like the gradle commands are using 3.8 by default. We probably 
don't need both. 



##
sdks/python/build.gradle:
##
@@ -107,6 +107,26 @@ tasks.register("generateYamlDocs") {
   outputs.file "${buildDir}/yaml-ref.html"
 }
 
+tasks.register("yamlIntegrationTests") {
+  description "Runs integration tests for yaml pipelines."
+
+  dependsOn installGcpTest
+  // Need to build all expansion services referenced in apache_beam/yaml/*.*
+  // grep -oh 'sdk.*Jar' sdks/python/apache_beam/yaml/*.yaml | sort | uniq
+  dependsOn ":sdks:java:extensions:schemaio-expansion-service:shadowJar"
+  dependsOn ":sdks:java:extensions:sql:expansion-service:shadowJar"
+  dependsOn ":sdks:java:io:expansion-service:build"
+  dependsOn ":sdks:java:io:google-cloud-platform:expansion-service:build"
+
+  doLast {
+exec {
+  executable 'sh'
+  args '-c', "${envdir}/bin/pytest -v 
apache_beam/yaml/integration_tests.py"
+}
+  }
+  outputs.file "${buildDir}/yaml-ref.html"

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Create YAML Join Transform [beam]

2024-04-05 Thread via GitHub


robertwb commented on PR #30734:
URL: https://github.com/apache/beam/pull/30734#issuecomment-2040585964

   If Beam SQL doesn't support cross join out of the box, we don't have to 
support it for our join (yet) either. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Create YAML Join Transform [beam]

2024-04-05 Thread via GitHub


Polber commented on PR #30734:
URL: https://github.com/apache/beam/pull/30734#issuecomment-2040575468

   > @Polber Do I need to take any action or do I just wait for #30681 and 
#30864 to be merged?
   
   #30864 should fix the issue once it is merged. I was also wrong with my 
initial claim. Join will not be broken, it just cannot be followed by 
`LogForTesting` or 'Flatten` in a pipeline.
   
   > Also, I noticed that the `CROSS JOIN` operation is not working, even in 
the simple case of `SELECT * FROM A CROSS JOIN B`. The error I see when using 
`CROSS JOIN` is `java.lang.UnsupportedOperationException: CROSS JOIN, JOIN ON 
FALSE is not supported!`, suggesting that the SQL engine does not support the 
syntax.
   > 
   > Could this be related to the issues above? Or is `CROSS JOIN` truly not 
supported?
   
   Yeah looks like Beam SQL does not support cross joins. I think it is fine to 
not include in v1, and we can always implement custom CoGroupByKey logic to 
simulate a cross join.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30864:
URL: https://github.com/apache/beam/pull/30864#discussion_r1554254408


##
sdks/python/apache_beam/runners/common.py:
##
@@ -1981,28 +1981,27 @@ def env_key(env):
 base_env_key(e)
 for e in environments.expand_anyof_environments(env)))
 
-  cannonical_enviornments = collections.defaultdict(list)
+  canonical_environments = collections.defaultdict(list)
   for env_id, env in pipeline_proto.components.environments.items():
-cannonical_enviornments[env_key(env)].append(env_id)
+canonical_environments[env_key(env)].append(env_id)
 
-  if len(cannonical_enviornments) == len(
-  pipeline_proto.components.environments):
+  if len(canonical_environments) == 
len(pipeline_proto.components.environments):
 # All environments are already sufficiently distinct.
 return pipeline_proto
 
   environment_remappings = {
   e: es[0]
-  for es in cannonical_enviornments.values() for e in es
+  for es in canonical_environments.values() for e in es
   }
 
   if not inplace:
 pipeline_proto = copy.copy(pipeline_proto)
 
   for t in pipeline_proto.components.transforms.values():

Review Comment:
   This makes it a bit clearer what's going on here.



##
sdks/python/apache_beam/runners/common.py:
##
@@ -1981,28 +1981,27 @@ def env_key(env):
 base_env_key(e)
 for e in environments.expand_anyof_environments(env)))
 
-  cannonical_enviornments = collections.defaultdict(list)
+  canonical_environments = collections.defaultdict(list)
   for env_id, env in pipeline_proto.components.environments.items():
-cannonical_enviornments[env_key(env)].append(env_id)
+canonical_environments[env_key(env)].append(env_id)
 
-  if len(cannonical_enviornments) == len(
-  pipeline_proto.components.environments):
+  if len(canonical_environments) == 
len(pipeline_proto.components.environments):
 # All environments are already sufficiently distinct.
 return pipeline_proto
 
   environment_remappings = {
   e: es[0]
-  for es in cannonical_enviornments.values() for e in es
+  for es in canonical_environments.values() for e in es
   }
 
   if not inplace:
 pipeline_proto = copy.copy(pipeline_proto)
 
   for t in pipeline_proto.components.transforms.values():

Review Comment:
   ```suggestion
   if t.environment_id not in pipeline_proto.components.environments:
 # TODO(https://github.com/apache/beam/issues/30876): Remove this 
workaround.
 continue
   if t.environment_id:
 t.environment_id = environment_remappings[t.environment_id]
 for w in pipeline_proto.components.windowing_strategies.values():
   if w.environment_id not in pipeline_proto.components.environments:
 # TODO(https://github.com/apache/beam/issues/30876): Remove this 
workaround.
 continue
   if w.environment_id:
 w.environment_id = environment_remappings[w.environment_id]```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Java expansion service does not respect the namespace for all environments. [beam]

2024-04-05 Thread via GitHub


robertwb commented on issue #30876:
URL: https://github.com/apache/beam/issues/30876#issuecomment-2040555076

   This was uncovered by https://github.com/apache/beam/pull/30864
   
   It can be reproduced with 
   
   ```
   import apache_beam as beam
   from apache_beam.transforms import external
   import logging
   
   from apache_beam.utils import subprocess_server
   
   logging.getLogger().setLevel(logging.INFO)
   
   with beam.Pipeline('DirectRunner') as p:
 i1 = p | "i1" >> beam.Create([beam.Row(name='john', id=1)])
 i2 = p | "i2" >> beam.Create([beam.Row(name='jane', id=1)])
 result = {'i1': i1, 'i2': i2} | 'Sql1' >> external.ExternalTransform(
   'beam:external:java:sql:v1',
   external.ImplicitSchemaPayloadBuilder(
 {'query': 'SELECT * FROM i1 INNER JOIN i2 ON i1.id = i2.id'}
   ).payload(),
   external.JavaJarExpansionService(
 subprocess_server.JavaJarServer.path_to_beam_jar(
   gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
   artifact_id=None
 )
   )) | 'LogForTesting' >> external.SchemaAwareExternalTransform(
   'beam:schematransform:org.apache.beam:yaml:log_for_testing:v1',
   external.JavaJarExpansionService(
 subprocess_server.JavaJarServer.path_to_beam_jar(
   gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
   artifact_id=None
 )
   ), rearrange_based_on_discovery=True)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update python transform catalog [beam]

2024-04-05 Thread via GitHub


damccorm commented on code in PR #30788:
URL: https://github.com/apache/beam/pull/30788#discussion_r1554225291


##
website/www/site/content/en/documentation/transforms/python/aggregation/approximatequantiles.md:
##
@@ -17,7 +17,14 @@ limitations under the License.
 
 # ApproximateQuantiles
 
+{{< localstorage language language-py >}}
+
+{{< button-pydoc path="apache_beam.transforms.stat" 
class="ApproximateQuantile" >}}
+
 ## Examples
-See [Issue 19547](https://github.com/apache/beam/issues/19547) for updates.
+
+{{< playground height="700px" >}}
+{{< playground_snippet language="py" path="SDK_PYTHON_ApproximateQuantiles" 
show="approximatequantiles" >}}
+{{< /playground >}}

Review Comment:
   Outside of the linting failure the changes look good to me, so just let me 
know how you'd prefer to proceed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: GRCPIO 1.62.1 deadlocks in Beam SDK worker with Dataflow [beam]

2024-04-05 Thread via GitHub


tvalentyn commented on issue #30867:
URL: https://github.com/apache/beam/issues/30867#issuecomment-2040544276

   Thanks.
   
   ```
   (py38) :python$ docker run --rm -it --entrypoint=/bin/sh 
apache/beam_python3.10_sdk:2.54.0  -c "pip list | grep grpcio" 
   grpcio  1.60.0
   grpcio-status   1.60.0
   (py38) :python$ docker run --rm -it --entrypoint=/bin/sh 
apache/beam_python3.10_sdk:2.55.0  -c "pip list | grep grpcio" 
   grpcio  1.62.0
   grpcio-status   1.62.0
   (py38) :python$ docker run --rm -it --entrypoint=/bin/sh 
apache/beam_python3.10_sdk:2.55.1  -c "pip list | grep grpcio" 
   grpcio  1.62.0
   grpcio-status   1.62.0
   ```
   
   > I could determine that at least version 1.60.0 doesn't exhibit this issue.
   
   Double checking, is this still the case?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow lazy iteration for non-reiterables. [beam]

2024-04-05 Thread via GitHub


robertwb merged PR #30851:
URL: https://github.com/apache/beam/pull/30851


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update python transform catalog [beam]

2024-04-05 Thread via GitHub


damccorm commented on code in PR #30788:
URL: https://github.com/apache/beam/pull/30788#discussion_r1554223645


##
website/www/site/content/en/documentation/transforms/python/aggregation/approximatequantiles.md:
##
@@ -17,7 +17,14 @@ limitations under the License.
 
 # ApproximateQuantiles
 
+{{< localstorage language language-py >}}
+
+{{< button-pydoc path="apache_beam.transforms.stat" 
class="ApproximateQuantile" >}}
+
 ## Examples
-See [Issue 19547](https://github.com/apache/beam/issues/19547) for updates.
+
+{{< playground height="700px" >}}
+{{< playground_snippet language="py" path="SDK_PYTHON_ApproximateQuantiles" 
show="approximatequantiles" >}}
+{{< /playground >}}

Review Comment:
   There's not a great single time to merge since we'd want the playground on 
the release branch, so we'd want to merge the playground fixes, wait a week or 
so for the release, then merge the website changes.
   
   With that said, if you definitely don't have time we can merge right before 
the release branch is cut (I'm running the release so should be easy for me to 
pick up) and the website will just be broken for a bit in these code samples. 
It isn't ideal, but would probably still be better than not doing this at all.
   
   Up to you on how to proceed here, I'll add this as a release blocker so I 
remember to merge though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add gradle target and github workflow for cross-langauge yaml tests. [beam]

2024-04-05 Thread via GitHub


damccorm commented on code in PR #30874:
URL: https://github.com/apache/beam/pull/30874#discussion_r1554217196


##
.github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml:
##
@@ -0,0 +1,101 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: PostCommit Python Xlang Gcp Direct
+
+on:
+  pull_request_target:
+paths: ['release/trigger_all_tests.json', 'model/**', 'sdks/python/**']
+  issue_comment:
+types: [created]
+  push:
+tags: ['v*']
+branches: ['master', 'release-*']
+paths: [ 
"model/**","sdks/python/**","release/**",".github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml"]
+  schedule:
+- cron: '30 5/6 * * *'
+  workflow_dispatch:
+
+#Setting explicit permissions for the action to avoid the default permissions 
which are `write-all` in case of pull_request_target event
+permissions:
+  actions: write
+  pull-requests: write
+  checks: write
+  contents: read
+  deployments: read
+  id-token: none
+  issues: write
+  discussions: read
+  packages: read
+  pages: read
+  repository-projects: read
+  security-events: read
+  statuses: read
+
+# This allows a subsequently queued workflow run to interrupt previous runs
+concurrency:
+  group: '${{ github.workflow }} @ ${{ github.event.issue.number || github.sha 
|| github.head_ref || github.ref }}-${{ github.event.schedule || 
github.event.comment.id || github.event.sender.login }}'
+  cancel-in-progress: true
+
+env:
+  GRADLE_ENTERPRISE_ACCESS_KEY: ${{ secrets.GE_ACCESS_TOKEN }}
+  GRADLE_ENTERPRISE_CACHE_USERNAME: ${{ secrets.GE_CACHE_USERNAME }}
+  GRADLE_ENTERPRISE_CACHE_PASSWORD: ${{ secrets.GE_CACHE_PASSWORD }}
+
+jobs:
+  beam_PreCommit_Yaml_Xlang_Direct:
+if: |
+  github.event_name == 'workflow_dispatch' ||
+  github.event_name == 'pull_request_target' ||
+  (github.event_name == 'schedule' && github.repository == 'apache/beam') 
||
+  github.event.comment.body == 'Run Yaml_Xlang_Direct PreCommit'
+runs-on: [self-hosted, ubuntu-20.04, main]
+timeout-minutes: 100
+name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
+strategy:
+  matrix:
+job_name: ["beam_PreCommit_Yaml_Xlang_Direct"]
+job_phrase: ["Run Yaml_Xlang_Direct"]
+steps:
+  - uses: actions/checkout@v4
+  - name: Setup repository
+uses: ./.github/actions/setup-action
+with:
+  comment_phrase: ${{ matrix.job_phrase }}
+  github_token: ${{ secrets.GITHUB_TOKEN }}
+  github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
+  - name: Setup environment
+uses: ./.github/actions/setup-environment-action
+with:
+  python-version: |

Review Comment:
   Do we actually need 2 python versions?



##
.github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml:
##
@@ -0,0 +1,101 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: PostCommit Python Xlang Gcp Direct
+
+on:
+  pull_request_target:
+paths: ['release/trigger_all_tests.json', 'model/**', 'sdks/python/**']
+  issue_comment:
+types: [created]
+  push:
+tags: ['v*']
+branches: ['master', 'release-*']
+paths: [ 
"model/**","sdks/python/**","release/**",".github/workflows/beam_PreCommit_Yaml_Xlang_Direct.yml"]
+  schedule:
+- cron: '30 5/6 * * *'
+  workflow_dispatch:
+
+#Setting explicit permissions for the action to avoid the default permissions 
which are `write-all` in case of pull_request_target event
+permissions:
+  actions: write
+  

Re: [PR] Lower various logging statement levels to clean up example printing [beam]

2024-04-05 Thread via GitHub


tvalentyn merged PR #30782:
URL: https://github.com/apache/beam/pull/30782


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


hjtran commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1554213775


##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py:
##
@@ -78,178 +76,193 @@ def normalize_kv(k, v):
 # For documentation.
 NamedTuple = beam.Row
 
+
 def check_groupby_expr_result(grouped):
-  assert_that(
-  grouped | beam.MapTuple(normalize_kv),
-  equal_to([
-  #[START groupby_expr_result]
-  ('s', ['strawberry']),
-  ('r', ['raspberry']),
-  ('b', ['banana', 'blackberry', 'blueberry']),
-  #[END groupby_expr_result]
-  ]))
+  pass  # TODO: Uncomment asserts

Review Comment:
   Ha, I initially put in early returns, but didn't want to fight possible 
flake8 issues about unreachable code since my IDE was giving me a hard time 
about it. I realize now flake8 doesn't complain about unreachable code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LockFreeHistogram and use it for PerWorkerHistograms [beam]

2024-04-05 Thread via GitHub


codecov-commenter commented on PR #30769:
URL: https://github.com/apache/beam/pull/30769#issuecomment-2040528515

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30769?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   Attention: Patch coverage is `94.80519%` with `4 lines` in your changes are 
missing coverage. Please review.
   > Project coverage is 68.45%. Comparing base 
[(`e894d8c`)](https://app.codecov.io/gh/apache/beam/commit/e894d8ca7fe10ed4771ce9d999b48361c25220ec?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`33d1ca1`)](https://app.codecov.io/gh/apache/beam/pull/30769?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 97 commits behind head on master.
   
   | 
[Files](https://app.codecov.io/gh/apache/beam/pull/30769?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[...eam/runners/dataflow/worker/LockFreeHistogram.java](https://app.codecov.io/gh/apache/beam/pull/30769?src=pr=tree=runners%2Fgoogle-cloud-dataflow-java%2Fworker%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Frunners%2Fdataflow%2Fworker%2FLockFreeHistogram.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cnVubmVycy9nb29nbGUtY2xvdWQtZGF0YWZsb3ctamF2YS93b3JrZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2JlYW0vcnVubmVycy9kYXRhZmxvdy93b3JrZXIvTG9ja0ZyZWVIaXN0b2dyYW0uamF2YQ==)
 | 93.33% | [2 Missing and 2 partials :warning: 
](https://app.codecov.io/gh/apache/beam/pull/30769?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@  Coverage Diff  @@
   ## master   #30769   +/-   ##
   =
   - Coverage 71.47%   68.45%-3.02% 
   - Complexity014789+14789 
   =
 Files   710 2611 +1901 
 Lines104815   220252   +115437 
 Branches  011740+11740 
   =
   + Hits  74915   150772+75857 
   - Misses2826863329+35061 
   - Partials   1632 6151 +4519 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30769/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[java](https://app.codecov.io/gh/apache/beam/pull/30769/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `65.71% <94.80%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30769?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump GCP-BOM to 26.36.0 [beam]

2024-04-05 Thread via GitHub


codecov-commenter commented on PR #30868:
URL: https://github.com/apache/beam/pull/30868#issuecomment-2040517608

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30868?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 66.26%. Comparing base 
[(`ffc96bc`)](https://app.codecov.io/gh/apache/beam/commit/ffc96bc2edf981efc9e7fe3934a81000c87c6974?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`d393efd`)](https://app.codecov.io/gh/apache/beam/pull/30868?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 57 commits behind head on master.
   
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #30868  +/-   ##
   
   + Coverage 61.25%   66.26%   +5.00% 
   - Complexity 447017763   +13293 
   
 Files   853 2252+1399 
 Lines 74090   143252   +69162 
 Branches   430714966   +10659 
   
   + Hits  4538594925   +49540 
   - Misses2522442351   +17127 
   - Partials   3481 5976+2495 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30868/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[go](https://app.codecov.io/gh/apache/beam/pull/30868/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `?` | |
   | 
[java](https://app.codecov.io/gh/apache/beam/pull/30868/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `66.26% <ø> (-2.34%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30868?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update python transform catalog [beam]

2024-04-05 Thread via GitHub


hjtran commented on code in PR #30788:
URL: https://github.com/apache/beam/pull/30788#discussion_r1554204921


##
website/www/site/content/en/documentation/transforms/python/aggregation/approximatequantiles.md:
##
@@ -17,7 +17,14 @@ limitations under the License.
 
 # ApproximateQuantiles
 
+{{< localstorage language language-py >}}
+
+{{< button-pydoc path="apache_beam.transforms.stat" 
class="ApproximateQuantile" >}}
+
 ## Examples
-See [Issue 19547](https://github.com/apache/beam/issues/19547) for updates.
+
+{{< playground height="700px" >}}
+{{< playground_snippet language="py" path="SDK_PYTHON_ApproximateQuantiles" 
show="approximatequantiles" >}}
+{{< /playground >}}

Review Comment:
   Ah, so the website can get upgraded within a release but playground gets 
redeployed with releases.
   
   It's all a bit tangled and I'm not sure I have the time to split it apart. 
Would it be alright to just keep the PR as is and merge whenever it makes 
sense? (assuming there is such a moment)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update python transform catalog [beam]

2024-04-05 Thread via GitHub


hjtran commented on code in PR #30788:
URL: https://github.com/apache/beam/pull/30788#discussion_r1554201194


##
website/www/site/content/en/documentation/transforms/python/overview.md:
##
@@ -48,13 +48,13 @@ limitations under the License.
 
 
   TransformDescription
-  ApproximateQuantilesNot available. See https://issues.apache.org/jira/browse/BEAM-6694;>BEAM-6694 for 
updates.
-  ApproximateUniqueNot available. See https://issues.apache.org/jira/browse/BEAM-6693;>BEAM-6693 for 
updates.
+  ApproximateQuantilesGiven
 a distribution, find the approximate N-tiles.
+  ApproximateUniqueGiven
 a pcollection, return the estimated number of unique elements.
+  BatchElementsGiven
 a pcollection, return the estimated number of unique elements.

Review Comment:
   Ah, I should've went looking for this. I had guessed that there might've 
been some autogenerating nav bar but I didn't confirm. 
   
   Thanks for pointing it out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add gradle target and github workflow for cross-langauge yaml tests. [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30874:
URL: https://github.com/apache/beam/pull/30874#issuecomment-2040509417

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @tvalentyn for label python.
   R: @damccorm for label build.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove outdated comment about docker. [beam]

2024-04-05 Thread via GitHub


robertwb merged PR #30871:
URL: https://github.com/apache/beam/pull/30871


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update python transform catalog [beam]

2024-04-05 Thread via GitHub


hjtran commented on code in PR #30788:
URL: https://github.com/apache/beam/pull/30788#discussion_r1554192861


##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/tolist_test.py:
##
@@ -0,0 +1,47 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import mock
+import unittest
+
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that, equal_to
+
+from . import tolist
+
+
+def identity(x):
+  return x
+
+
+@mock.patch('apache_beam.Pipeline', TestPipeline)
+# pylint: disable=line-too-long
+@mock.patch(
+'apache_beam.examples.snippets.transforms.aggregation.tolist.print',
+identity)
+# pylint: enable=line-too-long
+class BatchElementsTest(unittest.TestCase):
+  def test_tolist(self):
+def check(result):
+  assert_that(result, equal_to([['', '凌', '', '']]))

Review Comment:
   Changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update python transform catalog [beam]

2024-04-05 Thread via GitHub


hjtran commented on code in PR #30788:
URL: https://github.com/apache/beam/pull/30788#discussion_r1554189203


##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/batchelements_test.py:
##
@@ -0,0 +1,49 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+import mock
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that, equal_to
+
+from . import batchelements
+
+
+def check_batches(actual):
+  expected = [['', '凌', ''], ['', '凌', ''], ['', '凌', '', '']]

Review Comment:
   Good idea. Changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-04-05 Thread via GitHub


Abacn commented on PR #30802:
URL: https://github.com/apache/beam/pull/30802#issuecomment-2040486462

   Hi, thanks, would you mind sharing some number regarding the performance 
difference. e.g. A test case of 20,000,000 elements, and the run time for 
different batch sizes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1554181348


##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py:
##
@@ -78,178 +76,193 @@ def normalize_kv(k, v):
 # For documentation.
 NamedTuple = beam.Row
 
+
 def check_groupby_expr_result(grouped):
-  assert_that(
-  grouped | beam.MapTuple(normalize_kv),
-  equal_to([
-  #[START groupby_expr_result]
-  ('s', ['strawberry']),
-  ('r', ['raspberry']),
-  ('b', ['banana', 'blackberry', 'blueberry']),
-  #[END groupby_expr_result]
-  ]))
+  pass  # TODO: Uncomment asserts

Review Comment:
   Actually, since they're structured this way (sorry, I thought the asserts 
were inline, these tests are structured a bit oddly with the code and asserts 
is a separate file and a third line that links them), maybe just return here to 
reduce churn. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Flink] finalize checkpoint marks in the new Flink source implementation [beam]

2024-04-05 Thread via GitHub


Abacn commented on PR #30849:
URL: https://github.com/apache/beam/pull/30849#issuecomment-2040478988

   checked that if revert the change, the added unit test failed with
   
   ```
   java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertFalse(Assert.java:65)
at org.junit.Assert.assertFalse(Assert.java:75)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.source.unbounded.FlinkUnboundedSourceReaderTest.testCheckMarksFinalized(FlinkUnboundedSourceReaderTest.java:306)
   ```
   
   anyone able to confirm this has fixed the issue?  CC: @anartemp @gfalcone 
@noster-dev 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Lower various logging statement levels to clean up example printing [beam]

2024-04-05 Thread via GitHub


hjtran commented on code in PR #30782:
URL: https://github.com/apache/beam/pull/30782#discussion_r1554175807


##
sdks/python/apache_beam/runners/worker/statecache.py:
##
@@ -231,7 +231,7 @@ class StateCache(object):
   """
   def __init__(self, max_weight):
 # type: (int) -> None
-_LOGGER.info('Creating state cache with size %s', max_weight)
+_LOGGER.debug('Creating state cache with size %s', max_weight)

Review Comment:
   Reverted this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [YAML] Disable Java variant of WriteToJson [beam]

2024-04-05 Thread via GitHub


robertwb commented on PR #30777:
URL: https://github.com/apache/beam/pull/30777#issuecomment-2040448354

   https://github.com/apache/beam/issues/30776 was instead resolved by 
https://github.com/apache/beam/pull/30779


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [YAML] Disable Java variant of WriteToJson [beam]

2024-04-05 Thread via GitHub


robertwb closed pull request #30777: [YAML] Disable Java variant of WriteToJson
URL: https://github.com/apache/beam/pull/30777


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Add gradle target and github workflow for cross-langauge yaml tests. [beam]

2024-04-05 Thread via GitHub


robertwb opened a new pull request, #30874:
URL: https://github.com/apache/beam/pull/30874

   This should have caught the issue at 
https://github.com/apache/beam/issues/30776
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-05 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2040443750

   One other problem that causes additional failures in this suite is that in 
the test environment we first install test environment dependencies (e.g., 
tensorflow), then install the Beam package. This has implication on dependency 
resolution, and pip fails to resolve the conflicts.

   We might be able to prevent that if we install both deps in the same 
command, asked on 
https://github.com/tox-dev/tox/issues/2386#issuecomment-2040435212 if that is 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]

2024-04-05 Thread via GitHub


dariabezkorovaina commented on PR #30873:
URL: https://github.com/apache/beam/pull/30873#issuecomment-2040432740

   @andreydevyatkin please review when you have a chance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Duet AI Prompts - Documentation Lookup Without Links [beam]

2024-04-05 Thread via GitHub


dariabezkorovaina opened a new pull request, #30873:
URL: https://github.com/apache/beam/pull/30873

   * Creating versions without links for the existing documentation lookup 
prompts
   * Nits and fixes to documentation lookup prompts to improve language and 
match Google Writing Style guide 

   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


codecov-commenter commented on PR #30869:
URL: https://github.com/apache/beam/pull/30869#issuecomment-2040417337

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30869?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   Attention: Patch coverage is `20.0%` with `4 lines` in your changes are 
missing coverage. Please review.
   > Project coverage is 60.33%. Comparing base 
[(`ffc96bc`)](https://app.codecov.io/gh/apache/beam/commit/ffc96bc2edf981efc9e7fe3934a81000c87c6974?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`90ae8c5`)](https://app.codecov.io/gh/apache/beam/pull/30869?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 56 commits behind head on master.
   
   | 
[Files](https://app.codecov.io/gh/apache/beam/pull/30869?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[.../sdk/io/gcp/spanner/SpannerTransformRegistrar.java](https://app.codecov.io/gh/apache/beam/pull/30869?src=pr=tree=sdks%2Fjava%2Fio%2Fgoogle-cloud-platform%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Fsdk%2Fio%2Fgcp%2Fspanner%2FSpannerTransformRegistrar.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-c2Rrcy9qYXZhL2lvL2dvb2dsZS1jbG91ZC1wbGF0Zm9ybS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvYmVhbS9zZGsvaW8vZ2NwL3NwYW5uZXIvU3Bhbm5lclRyYW5zZm9ybVJlZ2lzdHJhci5qYXZh)
 | 20.00% | [3 Missing and 1 partial :warning: 
](https://app.codecov.io/gh/apache/beam/pull/30869?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #30869  +/-   ##
   
   - Coverage 61.25%   60.33%   -0.93% 
   + Complexity 4470 2985-1485 
   
 Files   853  659 -194 
 Lines 7409065999-8091 
 Branches   4307 3232-1075 
   
   - Hits  4538539819-5566 
   + Misses2522423084-2140 
   + Partials   3481 3096 -385 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30869/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[java](https://app.codecov.io/gh/apache/beam/pull/30869/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `68.54% <20.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30869?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove outdated comment about docker. [beam]

2024-04-05 Thread via GitHub


VeronicaWasson commented on PR #30871:
URL: https://github.com/apache/beam/pull/30871#issuecomment-2040405106

   lgtm, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1554101011


##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py:
##
@@ -38,6 +38,10 @@
 from .groupby_simple_aggregate import simple_aggregate
 from .groupby_two_exprs import groupby_two_exprs
 
+#
+# Temporarily skip all tests in file
+__test__ = False

Review Comment:
   (Though if you're committing to fix them ASAP, we can get this in.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1554098336


##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/groupby_test.py:
##
@@ -38,6 +38,10 @@
 from .groupby_simple_aggregate import simple_aggregate
 from .groupby_two_exprs import groupby_two_exprs
 
+#
+# Temporarily skip all tests in file
+__test__ = False

Review Comment:
   Probably preferable to comment out the incorrect asserts rather than disable 
the tests altogether (which will at least confirm the examples are 
syntactically correct and run without errors). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


hjtran commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1554096989


##
sdks/python/apache_beam/testing/util.py:
##
@@ -261,6 +261,22 @@ def assert_that(
   """
   assert isinstance(actual, pvalue.PCollection), (
   '%s is not a supported type for Beam assert' % type(actual))
+  pipeline = actual.pipeline
+  if getattr(actual.pipeline, 'result', None):
+# The pipeline was already run. The user most likely called assert_that
+# after the pipeleline context.
+raise RuntimeError(
+'assert_that must be used within a beam.Pipeline context')
+
+  # Usually, the uniqueness of the label is left to the pipeline
+  # writer to guarantee. Since we're in a testing context, we'll
+  # just automatically append a number to the label if it's
+  # already in use.

Review Comment:
   Oh yes, sorry I forgot to address this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1554092314


##
sdks/python/apache_beam/testing/util.py:
##
@@ -261,6 +261,22 @@ def assert_that(
   """
   assert isinstance(actual, pvalue.PCollection), (
   '%s is not a supported type for Beam assert' % type(actual))
+  pipeline = actual.pipeline
+  if getattr(actual.pipeline, 'result', None):
+# The pipeline was already run. The user most likely called assert_that
+# after the pipeleline context.
+raise RuntimeError(
+'assert_that must be used within a beam.Pipeline context')
+
+  # Usually, the uniqueness of the label is left to the pipeline
+  # writer to guarantee. Since we're in a testing context, we'll
+  # just automatically append a number to the label if it's
+  # already in use.

Review Comment:
   ```suggestion
 # already in use, as tests don't typically have to worry about
 # long-term update compatibility stability of stage names.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


robertwb commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1554092314


##
sdks/python/apache_beam/testing/util.py:
##
@@ -261,6 +261,22 @@ def assert_that(
   """
   assert isinstance(actual, pvalue.PCollection), (
   '%s is not a supported type for Beam assert' % type(actual))
+  pipeline = actual.pipeline
+  if getattr(actual.pipeline, 'result', None):
+# The pipeline was already run. The user most likely called assert_that
+# after the pipeleline context.
+raise RuntimeError(
+'assert_that must be used within a beam.Pipeline context')
+
+  # Usually, the uniqueness of the label is left to the pipeline
+  # writer to guarantee. Since we're in a testing context, we'll
+  # just automatically append a number to the label if it's
+  # already in use.

Review Comment:
   ```suggestion
 # already in use, as tests don't typically have to worry about long-term
 # update compatibility stability of stage names.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove outdated comment about docker. [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30871:
URL: https://github.com/apache/beam/pull/30871#issuecomment-2040378955

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove outdated comment about docker. [beam]

2024-04-05 Thread via GitHub


robertwb commented on PR #30871:
URL: https://github.com/apache/beam/pull/30871#issuecomment-2040374282

   R: @VeronicaWasson , @rszper


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [Bug]: Spanner module in Python SDK does not support read/write priority [beam]

2024-04-05 Thread via GitHub


angad opened a new issue, #30872:
URL: https://github.com/apache/beam/issues/30872

   ### What happened?
   
   Spanner module in Python SDK 
(https://beam.apache.org/releases/pydoc/2.27.0/apache_beam.io.gcp.spanner.html?highlight=spanner)
 does not support read/write priority, while the JAVA SDK does support 
(https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.Write.html)
 
   
   This support is crucial for managing autoscaling behavior of spanner.
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-05 Thread via GitHub


hjtran commented on PR #30771:
URL: https://github.com/apache/beam/pull/30771#issuecomment-2040374253

   I think this might be ready to be merged then if it's safe to ignore the 
pydantic/hypothesis errors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Remove outdated comment about docker. [beam]

2024-04-05 Thread via GitHub


robertwb opened a new pull request, #30871:
URL: https://github.com/apache/beam/pull/30871

   Realized I merged https://github.com/apache/beam/pull/30842 before this was 
addressed.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-05 Thread via GitHub


Polber commented on PR #30864:
URL: https://github.com/apache/beam/pull/30864#issuecomment-2040341056

   > @robertwb After further investigation, it appears this occurs anytime that 
2 subsequent ExternalTransforms are declared that use the same gradle target, 
and the first is given multiple inputs (PCollectionTuple). I haven't been able 
to repro unless the input to the first ExternalTransform is a dict of 
PCollections.
   
   Note: changing the gradle target for `LogForTesting` on either example I 
posted will result in a successful run.
   
   The transforms also do not have to be connected to each other:
   ```
 service = external.JavaJarExpansionService(
   subprocess_server.JavaJarServer.path_to_beam_jar(
 
gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
 artifact_id=None
   )
 )
   
 result = {'i1': i1, 'i2': i2} | 'Sql1' >> external.ExternalTransform(
 'beam:external:java:sql:v1',
 external.ImplicitSchemaPayloadBuilder(
   {'query': 'SELECT * FROM i1 INNER JOIN i2 ON i1.id = i2.id'}
 ).payload(),
 service)
   
 result2 = {'i1': i1, 'i2': i2} | 'Flatten' >> 
external.SchemaAwareExternalTransform(
   'beam:schematransform:org.apache.beam:yaml:flatten:v1',
   service, rearrange_based_on_discovery=True)
   ```
   
   and it only fails if one transform is `ExternalTransform` and one is 
`SchemaAwareExternalTransform`. i.e. the following does not fail:
   ```
   service = external.JavaJarExpansionService(
   subprocess_server.JavaJarServer.path_to_beam_jar(
 
gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
 artifact_id=None
   )
 )
   result2 = {'i1': i1, 'i2': i2} | 'Flatten' >> 
external.SchemaAwareExternalTransform(
   'beam:schematransform:org.apache.beam:yaml:flatten:v1',
   service, rearrange_based_on_discovery=True
 ) | 'LogForTesting' >> external.SchemaAwareExternalTransform(
   'beam:schematransform:org.apache.beam:yaml:log_for_testing:v1',
   service, rearrange_based_on_discovery=True)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] adding extra_header causing issue while reading GCS file in colab [beam]

2024-04-05 Thread via GitHub


samtseng18 commented on issue #30416:
URL: https://github.com/apache/beam/issues/30416#issuecomment-2040330078

   > @ajitsonawane1 - Google recommendation :We'd recommend that you install 
v2.16.0 (the latest version) of the google-cloud-storage to resolve this issue. 
It worked for me
   
   This worked for me too. Thanks! I had been stuck on this for a while :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adds a bound on pydantic to exclude incompatible versions in the compat test. [beam]

2024-04-05 Thread via GitHub


tvalentyn commented on PR #30863:
URL: https://github.com/apache/beam/pull/30863#issuecomment-2040323373

   Yes, we need to help dependency resolver to set correct constraints, taking 
a look. New error: 
   
   ```
   4361 2024-04-05T00:36:50.9531130Z pip 24.0 from 
/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-TFHubEmbeddings-015/py38-TFHubEmbeddings-015/lib/python3.8/site-packages/pip
 (python 3.8)
   4362 2024-04-05T00:36:51.0529237Z py38-TFHubEmbeddings-015: commands_pre[2]> 
pip check
   4363 2024-04-05T00:36:52.5529107Z tensorflow 2.13.1 has requirement 
typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.11.0rc1.
   4364 2024-04-05T00:36:52.5531012Z py38-TFHubEmbeddings-015: exit 1 (1.55 
seconds) 
/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python>
 pip check pid=62176
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [do not merge] remove --pre for testing purposes [beam]

2024-04-05 Thread via GitHub


tvalentyn closed pull request #30859: [do not merge] remove --pre for testing 
purposes
URL: https://github.com/apache/beam/pull/30859


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


pabloem commented on PR #30869:
URL: https://github.com/apache/beam/pull/30869#issuecomment-2040321256

   done thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-05 Thread via GitHub


Polber commented on PR #30864:
URL: https://github.com/apache/beam/pull/30864#issuecomment-2040321400

   @robertwb After further investigation, it appears this occurs anytime that 2 
subsequent ExternalTransforms are declared that use the same gradle target, and 
the first is given multiple inputs (PCollectionTuple). I haven't been able to 
repro unless the input to the first ExternalTransform is a dict of PCollections.
   
   Python example:
   ```
   import apache_beam as beam
   from apache_beam.transforms import external
   import logging
   
   from apache_beam.utils import subprocess_server
   
   logging.getLogger().setLevel(logging.INFO)
   
   with beam.Pipeline('DirectRunner') as p:
 i1 = p | "i1" >> beam.Create([beam.Row(name='john', id=1)])
 i2 = p | "i2" >> beam.Create([beam.Row(name='jane', id=1)])
 result = {'i1': i1, 'i2': i2} | 'Sql1' >> external.ExternalTransform(
   'beam:external:java:sql:v1',
   external.ImplicitSchemaPayloadBuilder(
 {'query': 'SELECT * FROM i1 INNER JOIN i2 ON i1.id = i2.id'}
   ).payload(),
   external.JavaJarExpansionService(
 subprocess_server.JavaJarServer.path_to_beam_jar(
   gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
   artifact_id=None
 )
   )) | 'LogForTesting' >> external.SchemaAwareExternalTransform(
   'beam:schematransform:org.apache.beam:yaml:log_for_testing:v1',
   external.JavaJarExpansionService(
 subprocess_server.JavaJarServer.path_to_beam_jar(
   gradle_target='sdks:java:extensions:sql:expansion-service:shadowJar',
   artifact_id=None
 )
   ), rearrange_based_on_discovery=True)
   ```
   
   YAML example:
   ```
   pipeline:
 transforms:
   - type: Create
 name: table1
 config:
   elements:
 - name: "john"
   id: 1
   - type: Create
 name: table2
 config:
   elements:
 - name: "jane"
   id: 1
   - type: Sql
 name: Join
 input:
   i1: table1
   i2: table2
 config:
   query: "SELECT * FROM i1 INNER JOIN i2 ON i1.id = i2.id"
   - type: LogForTestingJava
 input: Sql
   
   # Force same gradle target variant of LogForTesting
   providers:
 - type: 'beamJar'
   config:
 gradle_target: 'sdks:java:extensions:sql:expansion-service:shadowJar'
   transforms:
 LogForTestingJava: 
'beam:schematransform:org.apache.beam:yaml:log_for_testing:v1'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Correct the version spec [beam]

2024-04-05 Thread via GitHub


tvalentyn merged PR #30856:
URL: https://github.com/apache/beam/pull/30856


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


chamikaramj commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1554044656


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+
+  // TODO: Dynamically generate a list of supported transforms
+  public static final String ICEBERG = "iceberg";
+
+  public static Read read(String source) {
+
+return new AutoValue_Managed_Read.Builder()
+.setSource(
+Preconditions.checkNotNull(
+Read.TRANSFORMS.get(source.toLowerCase()),
+"An unsupported source was specified: '%s'. Please specify one 
of the following sources: %s",
+source,
+Read.TRANSFORMS.keySet()))
+.setSupportedIdentifiers(new ArrayList<>(Read.TRANSFORMS.values()))
+.build();
+  }
+
+  @AutoValue
+  public abstract static class Read extends SchemaTransform {
+public static final Map TRANSFORMS =
+ImmutableMap.builder()
+.put(ICEBERG, 
"beam:schematransform:org.apache.beam:iceberg_read:v1")
+.build();
+
+abstract String getSource();
+
+abstract @Nullable String getConfig();
+
+abstract @Nullable String getConfigUrl();
+
+abstract List getSupportedIdentifiers();
+
+abstract Builder toBuilder();
+
+@AutoValue.Builder
+abstract static class Builder {
+  abstract Builder setSource(String sourceIdentifier);
+
+  abstract Builder setConfig(@Nullable String config);

Review Comment:
   (it already should support it through LocalFileSystem so probably just test 
?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


chamikaramj commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1554043888


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+
+  // TODO: Dynamically generate a list of supported transforms
+  public static final String ICEBERG = "iceberg";
+
+  public static Read read(String source) {
+
+return new AutoValue_Managed_Read.Builder()
+.setSource(
+Preconditions.checkNotNull(
+Read.TRANSFORMS.get(source.toLowerCase()),
+"An unsupported source was specified: '%s'. Please specify one 
of the following sources: %s",
+source,
+Read.TRANSFORMS.keySet()))
+.setSupportedIdentifiers(new ArrayList<>(Read.TRANSFORMS.values()))
+.build();
+  }
+
+  @AutoValue
+  public abstract static class Read extends SchemaTransform {
+public static final Map TRANSFORMS =
+ImmutableMap.builder()
+.put(ICEBERG, 
"beam:schematransform:org.apache.beam:iceberg_read:v1")
+.build();
+
+abstract String getSource();
+
+abstract @Nullable String getConfig();
+
+abstract @Nullable String getConfigUrl();
+
+abstract List getSupportedIdentifiers();
+
+abstract Builder toBuilder();
+
+@AutoValue.Builder
+abstract static class Builder {
+  abstract Builder setSource(String sourceIdentifier);
+
+  abstract Builder setConfig(@Nullable String config);

Review Comment:
   I suggest removing this method and making setConfigUrl() support local files 
so that anybody who has the Yaml file locally can use that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add terraform for utility cluster. Add name override to gke [beam]

2024-04-05 Thread via GitHub


damondouglas commented on code in PR #30847:
URL: https://github.com/apache/beam/pull/30847#discussion_r1554017589


##
.test-infra/terraform/google-cloud-platform/google-kubernetes-engine/cluster.tf:
##
@@ -34,7 +34,9 @@ resource "google_container_cluster" "default" {
 enable_private_nodes= true
 enable_private_endpoint = false
   }
-  node_config {
-service_account = data.google_service_account.default.email
+  cluster_autoscaling {

Review Comment:
   What happened when you tested this terraform module then? Were you able to 
successfully deploy the kafka cluster? The autopilot terraform specification 
changes rapidly but I had some success recently with:
   ```
   resource "google_container_cluster" "default" {
 name= var.resource_name
 deletion_protection = false
 location= var.region
 enable_autopilot= true
 network = google_compute_network.default.id
 subnetwork  = google_compute_subnetwork.default.id
 private_cluster_config {
   enable_private_nodes= true
   enable_private_endpoint = false
 }
 cluster_autoscaling {
   auto_provisioning_defaults {
 service_account = google_service_account.default.email
 oauth_scopes= [
   "https://www.googleapis.com/auth/cloud-platform;
 ]
   }
 }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


damondouglas commented on code in PR #30869:
URL: https://github.com/apache/beam/pull/30869#discussion_r1554014922


##
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java:
##
@@ -341,6 +346,9 @@ public PTransform, PDone> buildExternal(
   .withDatabaseId(configuration.databaseId)
   .withInstanceId(configuration.instanceId);
 
+  if (configuration.highPriority != null && configuration.highPriority) {

Review Comment:
   I agree. Zero value should just be false. Thank you Pablo for doing this :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump GCP-BOM to 26.36.0 [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30868:
URL: https://github.com/apache/beam/pull/30868#issuecomment-2040274277

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Yaml API: Day Zero tutorial notebook [beam]

2024-04-05 Thread via GitHub


robertwb merged PR #27284:
URL: https://github.com/apache/beam/pull/27284


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Proposed edits for Beam YAML overview [beam]

2024-04-05 Thread via GitHub


robertwb merged PR #30842:
URL: https://github.com/apache/beam/pull/30842


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Cannot read from Kafka due to short poll timeout of consumer in KafkaIO [beam]

2024-04-05 Thread via GitHub


xianhualiu commented on issue #30870:
URL: https://github.com/apache/beam/issues/30870#issuecomment-2040238145

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [Bug]: Cannot read from Kafka due to short poll timeout of consumer in KafkaIO [beam]

2024-04-05 Thread via GitHub


xianhualiu opened a new issue, #30870:
URL: https://github.com/apache/beam/issues/30870

   ### What happened?
   
   The default Kafka consumer poll timeout is set to 1 second. It works fine 
when the the response can get messages from the kafka broker server within this 
1 second, such as when client accesses broker within the same region. But if 
the responding time is more than 1 second, the consumer will not retrieve any 
messages. One customer reported that throughput of processed messages was 
extremely low in cross-region read since most of the time the responding time 
takes more than 1 second.
   
   As a solution, the Kafka consumer polling timeout needs to be configurable, 
so customer can adjust it according to their needs.
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30869:
URL: https://github.com/apache/beam/pull/30869#issuecomment-2040229995

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @shunping for label python.
   R: @damondouglas for label java.
   R: @damondouglas for label io.
   R: @nielm for label spanner.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow lazy iteration for non-reiterables. [beam]

2024-04-05 Thread via GitHub


github-actions[bot] commented on PR #30851:
URL: https://github.com/apache/beam/pull/30851#issuecomment-2040226965

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow lazy iteration for non-reiterables. [beam]

2024-04-05 Thread via GitHub


robertwb commented on PR #30851:
URL: https://github.com/apache/beam/pull/30851#issuecomment-2040223207

   R: @priyansndesai


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] python sdk: fix several bugs regarding avto <-> beam schema conversion [beam]

2024-04-05 Thread via GitHub


ahmedabu98 commented on PR #30770:
URL: https://github.com/apache/beam/pull/30770#issuecomment-2040217043

   @benkonz thanks for making the changes. will try to take a look next week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


johnjcasey commented on code in PR #30869:
URL: https://github.com/apache/beam/pull/30869#discussion_r1553965228


##
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java:
##
@@ -341,6 +346,9 @@ public PTransform, PDone> buildExternal(
   .withDatabaseId(configuration.databaseId)
   .withInstanceId(configuration.instanceId);
 
+  if (configuration.highPriority != null && configuration.highPriority) {

Review Comment:
   is it worth defaulting to false just so you can avoid a null check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [YAML] Interpret PDone as no outputs. [beam]

2024-04-05 Thread via GitHub


robertwb merged PR #30862:
URL: https://github.com/apache/beam/pull/30862


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: WriteToPubSub failing on Beam YAML [beam]

2024-04-05 Thread via GitHub


robertwb closed issue #30446: [Bug]: WriteToPubSub failing on Beam YAML
URL: https://github.com/apache/beam/issues/30446


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


pabloem commented on PR #30869:
URL: https://github.com/apache/beam/pull/30869#issuecomment-2040142904

   is it still @johnjcasey ? : D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump GCP-BOM to 26.36.0 [beam]

2024-04-05 Thread via GitHub


Abacn commented on PR #30868:
URL: https://github.com/apache/beam/pull/30868#issuecomment-2040143497

   https://github.com/googleapis/java-bigtable/pull/2177 introduced an overload 
method causing
   
   ```
   > Task :sdks:java:io:google-cloud-platform:compileTestJava
   
doReturn(mockBatcher).when(mockBigtableDataClient).newBulkMutationBatcher(any());
 ^
 both method newBulkMutationBatcher(String) in BigtableDataClient and 
method newBulkMutationBatcher(TargetId) in BigtableDataClient match
   ```
   
   fixed the test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


pabloem commented on PR #30869:
URL: https://github.com/apache/beam/pull/30869#issuecomment-2040137087

   this is needed because spanner autoscaling depends on high priority queries 
- so a spanner database can easily be overloaded by a beam pipeline and still 
not scale up


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-05 Thread via GitHub


pabloem opened a new pull request, #30869:
URL: https://github.com/apache/beam/pull/30869

   …to spanner
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Side Input Singleton View throw error when impulse period is short: PCollection with more than one element accessed as a singleton view [beam]

2024-04-05 Thread via GitHub


kennknowles commented on issue #26465:
URL: https://github.com/apache/beam/issues/26465#issuecomment-2040122260

   I believe the workaround is "use View.asIterable and take the last element 
of the iterable when consuming the side input."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump GCP-BOM to 26.36.0 [beam]

2024-04-05 Thread via GitHub


Abacn opened a new pull request, #30868:
URL: https://github.com/apache/beam/pull/30868

   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-05 Thread via GitHub


robertwb commented on PR #30864:
URL: https://github.com/apache/beam/pull/30864#issuecomment-2039987623

   That would definitely be an erroneously constructed graph. Can you give a 
concrete example of how this is crated? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


ahmedabu98 commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1553784338


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/ManagedSchemaTransformProvider.java:
##
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import static 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument;
+
+import com.google.auto.service.AutoService;
+import com.google.auto.value.AutoValue;
+import java.io.IOException;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.ServiceLoader;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.schemas.AutoValueSchema;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
+import org.apache.beam.sdk.schemas.annotations.SchemaFieldDescription;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransformProvider;
+import org.apache.beam.sdk.schemas.transforms.TypedSchemaTransformProvider;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Strings;
+
+@AutoService(SchemaTransformProvider.class)
+class ManagedSchemaTransformProvider
+extends 
TypedSchemaTransformProvider {
+
+  @Override
+  public String identifier() {
+return "beam:schematransform:org.apache.beam:managed:v1";
+  }
+
+  private final Map schemaTransformProviders 
= new HashMap<>();
+
+  private ManagedSchemaTransformProvider(Collection identifiers) {
+try {
+  for (SchemaTransformProvider schemaTransformProvider :
+  ServiceLoader.load(SchemaTransformProvider.class)) {
+if 
(schemaTransformProviders.containsKey(schemaTransformProvider.identifier())) {
+  throw new IllegalArgumentException(
+  "Found multiple SchemaTransformProvider implementations with the 
same identifier "
+  + schemaTransformProvider.identifier());
+}
+schemaTransformProviders.put(schemaTransformProvider.identifier(), 
schemaTransformProvider);
+  }
+} catch (Exception e) {
+  throw new RuntimeException(e.getMessage());
+}
+
+schemaTransformProviders.entrySet().removeIf(e -> 
!identifiers.contains(e.getKey()));
+  }
+
+  private static @Nullable ManagedSchemaTransformProvider managedProvider = 
null;
+
+  public static ManagedSchemaTransformProvider of(Collection 
supportedIdentifiers) {
+if (managedProvider == null) {
+  managedProvider = new 
ManagedSchemaTransformProvider(supportedIdentifiers);
+}
+return managedProvider;
+  }
+
+  @DefaultSchema(AutoValueSchema.class)
+  @AutoValue
+  public abstract static class ManagedConfig {

Review Comment:
   ServiceLoader needs to instantiate all instances of SchemaTransformProvider 
including this one. This class needs to be public and have a public no-arg 
constructor



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


ahmedabu98 commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1553785984


##
sdks/java/managed/src/test/java/org/apache/beam/sdk/managed/ManagedTest.java:
##
@@ -0,0 +1,2 @@
+package org.apache.beam.sdk.managed;public class ManagedTest {
+}

Review Comment:
   Added



##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+  public static final String ICEBERG = "iceberg";

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-05 Thread via GitHub


codecov-commenter commented on PR #30808:
URL: https://github.com/apache/beam/pull/30808#issuecomment-2039949330

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30808?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 68.55%. Comparing base 
[(`fc5df6f`)](https://app.codecov.io/gh/apache/beam/commit/fc5df6f261bbe6ea910ffc1de8e6c093c9751c60?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`d4258cd`)](https://app.codecov.io/gh/apache/beam/pull/30808?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 25 commits behind head on master.
   
   > :exclamation: Current head d4258cd differs from pull request most recent 
head 02c04e6. Consider uploading reports for the commit 02c04e6 to get more 
accurate results
   
   
   Additional details and impacted files
   
   
   ```diff
   @@  Coverage Diff  @@
   ## master   #30808   +/-   ##
   =
   - Coverage 70.73%   68.55%-2.18% 
   + Complexity 4468 2985 -1483 
   =
 Files  1256  352  -904 
 Lines14077427871   -112903 
 Branches   4306 3231 -1075 
   =
   - Hits  9958119108-80473 
   + Misses37714 7302-30412 
   + Partials   3479 1461 -2018 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30808/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[go](https://app.codecov.io/gh/apache/beam/pull/30808/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `?` | |
   | 
[java](https://app.codecov.io/gh/apache/beam/pull/30808/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `68.55% <ø> (-0.03%)` | :arrow_down: |
   | 
[python](https://app.codecov.io/gh/apache/beam/pull/30808/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30808?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Side Input Singleton View throw error when impulse period is short: PCollection with more than one element accessed as a singleton view [beam]

2024-04-05 Thread via GitHub


IvanFroehlich commented on issue #26465:
URL: https://github.com/apache/beam/issues/26465#issuecomment-2039863167

   No, thats the issue. we have tried following meanwhile:
   
   **v2**: replace the latest globally with custom combine function
   ...
   .apply(Combine.globally(new CombineSerializableFunction())
   .apply(View.asSingleton());
   ...
 static class CombineSerializableFunction implements
 SerializableFunction, 
SideInputDestinations> {
   
   @Override
   public @UnknownKeyFor @Nullable @Initialized SideInputDestinations apply(
   Iterable input) {
 SideInputDestinations last = null;
 Iterator iterator = input.iterator();
 while (iterator.hasNext()) {
   last = iterator.next();
   log.info("SideInput: processing iteration for object {}", 
last.hashCode());
 }
 return last;
   }
 }
   
   **=> same Exception**
   
   **v3:** replace the View.asSingleton() with asSingletonView():
   ...
   .apply(Combine.globally(new 
CombineSerializableFunction()).asSingletonView());
   ...
 static class CombineSerializableFunction implements
 SerializableFunction, 
SideInputDestinations> {
   
   @Override
   public @UnknownKeyFor @Nullable @Initialized SideInputDestinations apply(
   Iterable input) {
 SideInputDestinations last = null;
 Iterator iterator = input.iterator();
 while (iterator.hasNext()) {
   last = iterator.next();
   log.info("SideInput: processing iteration for object {}", 
last.hashCode());
 }
 return last;
   }
 }
   
   => same Exception in using ParDo: 
   
 Caused by: java.lang.IllegalArgumentException: PCollection with 
more than one element accessed as a singleton view.
   
   Should we try another version?
   
   Kind Regards,
   Ivan
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >