Re: [PR] Bump com.gradle.common-custom-user-data-gradle-plugin from 1.12.1 to 2 [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30865:
URL: https://github.com/apache/beam/pull/30865#issuecomment-2038982164

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump com.gradle.common-custom-user-data-gradle-plugin from 1.12.1 to 1.13 [beam]

2024-04-04 Thread via GitHub


dependabot[bot] closed pull request #30451: Bump 
com.gradle.common-custom-user-data-gradle-plugin from 1.12.1 to 1.13
URL: https://github.com/apache/beam/pull/30451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump com.gradle.common-custom-user-data-gradle-plugin from 1.12.1 to 1.13 [beam]

2024-04-04 Thread via GitHub


dependabot[bot] commented on PR #30451:
URL: https://github.com/apache/beam/pull/30451#issuecomment-2038921935

   Superseded by #30865.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump com.gradle.common-custom-user-data-gradle-plugin from 1.12.1 to 2 [beam]

2024-04-04 Thread via GitHub


dependabot[bot] opened a new pull request, #30865:
URL: https://github.com/apache/beam/pull/30865

   Bumps com.gradle.common-custom-user-data-gradle-plugin from 1.12.1 to 2.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.gradle.common-custom-user-data-gradle-plugin=gradle=1.12.1=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]

2024-04-04 Thread via GitHub


MelodyShen commented on PR #30439:
URL: https://github.com/apache/beam/pull/30439#issuecomment-2038766063

   Hi @scwhittle thanks for reviewing the changes. I have rebased master to 
catch up with the latest and all checks passed. Would you mind merging the PR 
when available? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Create YAML Join Transform [beam]

2024-04-04 Thread via GitHub


Polber commented on PR #30734:
URL: https://github.com/apache/beam/pull/30734#issuecomment-2038579469

   @robertwb @itodotimothy6 FYI this breaks on master due to 
   * https://github.com/apache/beam/pull/30681
   
   I have only been able to repro by using any `SqlBackedProvider` with a query 
that joins 2 inputs (not just this PR). I have a fix here:
   * https://github.com/apache/beam/pull/30864
   
   but since it doesn't get hit in any of my other repro attempts, I think 
there may also be an issue with how the environments get constructed using 
these providers (but my context on the manner is limited).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30864:
URL: https://github.com/apache/beam/pull/30864#issuecomment-2038577539

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] merging non-canonical environments fails [beam]

2024-04-04 Thread via GitHub


Polber commented on PR #30864:
URL: https://github.com/apache/beam/pull/30864#issuecomment-2038576770

   R: @robertwb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] merging non-canonical environments fails [beam]

2024-04-04 Thread via GitHub


Polber opened a new pull request, #30864:
URL: https://github.com/apache/beam/pull/30864

   There are cases where environments are specified at the transform/windowing 
level that are not included in the top-level components' environments.
   
   This is particularly present when running sql-backed provider on multiple 
inputs (i.e. when joining 2 pcollections) in Beam YAML which creates 2 default 
environments, but only declares one in the proto.components.environments 
section.
   
   Perhaps the real issue lies in how Beam YAML providers create transforms, 
but this is a nice safeguard regardless.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [do not merge] remove --pre for testing purposes [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30859:
URL: https://github.com/apache/beam/pull/30859#issuecomment-2038487214

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]

2024-04-04 Thread via GitHub


codecov-commenter commented on PR #30439:
URL: https://github.com/apache/beam/pull/30439#issuecomment-2038472128

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30439?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   Attention: Patch coverage is `86.20690%` with `8 lines` in your changes are 
missing coverage. Please review.
   > Project coverage is 62.87%. Comparing base 
[(`0d41168`)](https://app.codecov.io/gh/apache/beam/commit/0d41168a0963869df037e468f80f5ab8466a99ab?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`d499074`)](https://app.codecov.io/gh/apache/beam/pull/30439?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 1 commits behind head on master.
   
   | 
[Files](https://app.codecov.io/gh/apache/beam/pull/30439?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[...reaming/harness/StreamingWorkerStatusReporter.java](https://app.codecov.io/gh/apache/beam/pull/30439?src=pr=tree=runners%2Fgoogle-cloud-dataflow-java%2Fworker%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Frunners%2Fdataflow%2Fworker%2Fstreaming%2Fharness%2FStreamingWorkerStatusReporter.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cnVubmVycy9nb29nbGUtY2xvdWQtZGF0YWZsb3ctamF2YS93b3JrZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2JlYW0vcnVubmVycy9kYXRhZmxvdy93b3JrZXIvc3RyZWFtaW5nL2hhcm5lc3MvU3RyZWFtaW5nV29ya2VyU3RhdHVzUmVwb3J0ZXIuamF2YQ==)
 | 83.87% | [3 Missing and 2 partials :warning: 
](https://app.codecov.io/gh/apache/beam/pull/30439?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   | 
[...ers/dataflow/worker/util/BoundedQueueExecutor.java](https://app.codecov.io/gh/apache/beam/pull/30439?src=pr=tree=runners%2Fgoogle-cloud-dataflow-java%2Fworker%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Frunners%2Fdataflow%2Fworker%2Futil%2FBoundedQueueExecutor.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-cnVubmVycy9nb29nbGUtY2xvdWQtZGF0YWZsb3ctamF2YS93b3JrZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2JlYW0vcnVubmVycy9kYXRhZmxvdy93b3JrZXIvdXRpbC9Cb3VuZGVkUXVldWVFeGVjdXRvci5qYXZh)
 | 86.95% | [2 Missing and 1 partial :warning: 
](https://app.codecov.io/gh/apache/beam/pull/30439?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@  Coverage Diff  @@
   ## master   #30439   +/-   ##
   =
   + Coverage 56.86%   62.87%+6.00% 
   - Complexity 148514773+13288 
   =
 Files   501 2207 +1706 
 Lines 46219   153504   +107285 
 Branches   107611735+10659 
   =
   + Hits  2628396511+70228 
   - Misses1791850840+32922 
   - Partials   2018 6153 +4135 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30439/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[java](https://app.codecov.io/gh/apache/beam/pull/30439/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `65.69% <86.20%> (-3.13%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30439?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-04 Thread via GitHub


chamikaramj commented on PR #30808:
URL: https://github.com/apache/beam/pull/30808#issuecomment-2038454155

   Also, please fix spotless/lint failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use empty flags for default expansion service options. [beam]

2024-04-04 Thread via GitHub


robertwb merged PR #30858:
URL: https://github.com/apache/beam/pull/30858


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-04 Thread via GitHub


chamikaramj commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1552590182


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+  public static final String ICEBERG = "iceberg";
+
+  public static Read read(String source) {
+
+return new AutoValue_Managed_Read.Builder()
+.setSource(
+Preconditions.checkNotNull(
+Read.TRANSFORMS.get(source.toLowerCase()),
+"An unsupported source was specified: '%s'. Please specify one 
of the following source: %s",
+source,
+Read.TRANSFORMS.keySet()))
+.setSupportedIdentifiers(new ArrayList<>(Read.TRANSFORMS.values()))
+.build();
+  }
+
+  @AutoValue
+  public abstract static class Read extends SchemaTransform {
+public static final Map TRANSFORMS =
+ImmutableMap.builder()
+.put(ICEBERG, 
"beam:schematransform:org.apache.beam:iceberg_read:v1")
+.build();
+
+abstract String getSource();
+
+abstract @Nullable String getConfig();
+
+abstract @Nullable String getConfigUrl();
+
+abstract List getSupportedIdentifiers();
+
+abstract Builder toBuilder();
+
+@AutoValue.Builder
+abstract static class Builder {
+  abstract Builder setSource(String source);
+
+  abstract Builder setConfig(@Nullable String config);
+
+  abstract Builder setConfigUrl(@Nullable String configUrl);
+
+  abstract Builder setSupportedIdentifiers(List 
supportedIdentifiers);
+
+  abstract Read build();
+}
+
+public Read withConfigUrl(String configUrl) {

Review Comment:
   Please add Java docs for `withConfig` methods.



##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.schemas.utils.YamlUtils;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap;
+
+public class Managed {
+  public static final String ICEBERG = "iceberg";
+
+  public static Read read(String source) {
+
+return new AutoValue_Managed_Read.Builder()
+

Re: [PR] Add a lower bound on pydantic [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30863:
URL: https://github.com/apache/beam/pull/30863#issuecomment-2038448445

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @damccorm for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [YAML] Interpret PDone as no outputs. [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30862:
URL: https://github.com/apache/beam/pull/30862#issuecomment-2038435261

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [YAML] Interpret PDone as no outputs. [beam]

2024-04-04 Thread via GitHub


robertwb commented on PR #30862:
URL: https://github.com/apache/beam/pull/30862#issuecomment-2038434517

   R: @ffernandez92


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [do not merge] remove --pre for testing purposes [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30859:
URL: https://github.com/apache/beam/pull/30859#issuecomment-2038414942

   ## Test Results
       7 files   -    43      7 suites   - 43   20m 52s :stopwatch: + 12m 31s
   7 573 tests +7 190  6 691 :white_check_mark: +6 310  882 :zzz: +880  0 :x: 
±0 
   7 709 runs  +7 326  6 721 :white_check_mark: +6 340  988 :zzz: +986  0 :x: 
±0 
   
   Results for commit d7cfc782. ± Comparison against base commit 0d41168a.
   
   
 This pull request removes 383 and adds 7573 tests. 
Note that renamed tests count towards both.
   
   ```
   org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunnerTest ‑ 
testGarbageCollectForStatefulDoFnRunner
   org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunnerTest ‑ 
testLateDroppingForStatefulDoFnRunner
   org.apache.beam.runners.direct.WriteWithShardingFactoryTest ‑ 
keyBasedOnCountFnFewElementsExtraShards
   org.apache.beam.runners.direct.WriteWithShardingFactoryTest ‑ 
keyBasedOnCountFnFewElementsThreeShards
   org.apache.beam.runners.direct.WriteWithShardingFactoryTest ‑ 
keyBasedOnCountFnManyElements
   org.apache.beam.runners.direct.WriteWithShardingFactoryTest ‑ 
keyBasedOnCountFnManyElementsExtraShards
   org.apache.beam.runners.direct.WriteWithShardingFactoryTest ‑ 
keyBasedOnCountFnWithNoElements
   org.apache.beam.runners.direct.WriteWithShardingFactoryTest ‑ 
keyBasedOnCountFnWithOneElement
   org.apache.beam.runners.direct.WriteWithShardingFactoryTest ‑ 
keyBasedOnCountFnWithTwoElements
   org.apache.beam.sdk.PipelineTest ‑ testIdentityTransform
   …
   ```
   
   ```
   apache_beam.coders.coders_property_based_test.ProperyTestingCoders ‑ 
test_float_coder
   apache_beam.coders.coders_property_based_test.ProperyTestingCoders ‑ 
test_row_coder
   apache_beam.coders.coders_property_based_test.ProperyTestingCoders ‑ 
test_string_coder
   apache_beam.coders.coders_property_based_test.TypesAreAllTested ‑ 
test_all_types_are_tested
   apache_beam.coders.coders_test.AvroCoderTest ‑ test_avro_record_coder
   apache_beam.coders.coders_test.CodersTest ‑ test_str_utf8_coder
   apache_beam.coders.coders_test.DeterministicProtoCoderTest ‑ 
test_deterministic_proto_coder
   apache_beam.coders.coders_test.DeterministicProtoCoderTest ‑ 
test_deterministic_proto_coder_determinism
   apache_beam.coders.coders_test.FallbackCoderTest ‑ test_default_fallback_path
   apache_beam.coders.coders_test.NullableCoderTest ‑ test_determinism
   …
   ```
   
   
   
 This pull request removes 2 skipped tests and adds 
882 skipped tests. Note that renamed tests count towards both.
   
   ```
   org.apache.beam.sdk.transforms.SplittableDoFnTest ‑ testLateData
   org.apache.beam.sdk.values.PDoneTest ‑ testEmptyTransform
   ```
   
   ```
   apache_beam.coders.fast_coders_test.FastCoders ‑ test_using_fast_impl
   apache_beam.coders.row_coder_test.RowCoderTest ‑ test_overflows
   apache_beam.coders.slow_coders_test.SlowCoders ‑ test_using_slow_impl
   apache_beam.coders.typecoders_test.TypeCodersTest ‑ test_list_coder
   apache_beam.dataframe.frames_test.AggregationTest ‑ test_agg_min_count
   apache_beam.dataframe.frames_test.AggregationTest ‑ test_dataframe_agg_level
   apache_beam.dataframe.frames_test.AggregationTest ‑ 
test_dataframe_agg_level_bool_only
   apache_beam.dataframe.frames_test.AggregationTest ‑ 
test_dataframe_agg_level_numeric_only
   apache_beam.dataframe.frames_test.AggregationTest ‑ 
test_dataframe_agg_multifunc_level
   apache_beam.dataframe.frames_test.AggregationTest ‑ test_series_agg_level
   …
   ```
   
   
   
[test-results]:data:application/gzip;base64,H4sIAO4xD2YC/03MTQ6DIBCG4asY1l0AijP0MoYOkpD60yCsmt69E6vU5ftM5nuLEKdxE/cGbo3YSsw1fEkux3XhVNpoFr7l/WqgPXPYChFb31v1t2d8sSHWryG4ODHJCmNKazoklWXfBWmPqrOgVaXfqkU85TK693WT1nmOmUN4oECA2viO0EDnrFdOyg69DYD2IdsAQEDi8wV66BFFDgEAAA==
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Add a lower bound on pydantic [beam]

2024-04-04 Thread via GitHub


tvalentyn opened a new pull request, #30863:
URL: https://github.com/apache/beam/pull/30863

   fixes: #30852 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2038404492

   > the fact that old version of pydantic gets installed due to --pre flag may 
or may not be a factor.
   
   that _is_ the factor. The  pydantic-2.0a4 distribution has the following:
   
   (py38b) :py38b$ cat 
lib/python3.8/site-packages/pydantic-2.0a4.dist-info/entry_points.txt 
   [hypothesis]
   _ = pydantic._hypothesis_plugin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [YAML] Interpret PDone as no outputs. [beam]

2024-04-04 Thread via GitHub


robertwb opened a new pull request, #30862:
URL: https://github.com/apache/beam/pull/30862

   This should fix https://github.com/apache/beam/issues/30446
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Proposed edits for Beam YAML overview [beam]

2024-04-04 Thread via GitHub


robertwb commented on code in PR #30842:
URL: https://github.com/apache/beam/pull/30842#discussion_r1552524970


##
website/www/site/content/en/documentation/sdks/yaml.md:
##
@@ -23,80 +23,132 @@ title: "Apache Beam YAML API"
 
 # Beam YAML API
 
-While Beam provides powerful APIs for authoring sophisticated data
-processing pipelines, it often still has too high a barrier for
-getting started and authoring simple pipelines. Even setting up the
-environment, installing the dependencies, and setting up the project
-can be an overwhelming amount of boilerplate for some (though
-https://beam.apache.org/blog/beam-starter-projects/ has gone a long
-way in making this easier).
-
-Here we provide a simple declarative syntax for describing pipelines
-that does not require coding experience or learning how to use an
-SDKany text editor will do.
-Some installation may be required to actually *execute* a pipeline, but
-we envision various services (such as Dataflow) to accept yaml pipelines
-directly obviating the need for even that in the future.
-We also anticipate the ability to generate code directly from these
-higher-level yaml descriptions, should one want to graduate to a full
-Beam SDK (and possibly the other direction as well as far as possible).
-
-Though we intend this syntax to be easily authored (and read) directly by
-humans, this may also prove a useful intermediate representation for
-tools to use as well, either as output (e.g. a pipeline authoring GUI)
-or consumption (e.g. a lineage analysis tool) and expect it to be more
-easily manipulated and semantically meaningful than the Beam protos
-themselves (which concern themselves more with execution).
-
-It should be noted that everything here is still under development, but any
-features already included are considered stable. Feedback is welcome at
-d...@apache.beam.org.
-
-## Running pipelines
-
-The Beam yaml parser is currently included as part of the Apache Beam Python 
SDK.
-This can be installed (e.g. within a virtual environment) as
+Beam YAML is a declarative syntax for describing Apache Beam pipelines by using
+YAML files. You can use Beam YAML to author and run a Beam pipeline without
+writing any code.
+
+## Overview
+
+Beam provides a powerful model for creating sophisticated data processing
+pipelines. However, getting started with Beam programming can be challenging
+because it requires writing code in one of the supported Beam SDK languages.
+You need to understand the APIs, set up a project, manage dependencies, and
+perform other programming tasks.
+
+Beam YAML makes it easier to get started with creating Beam pipelines. Instead
+of writing code, you create a YAML file using any text editor. Then you submit
+the YAML file to be executed by a runner.
+
+The Beam YAML syntax is designed to be human-readable but also suitable as an
+intermediate representation for tools. For example, a pipeline authoring GUI
+could output YAML, or a lineage analysis tool could consume the YAML pipeline
+specifications.
+
+Beam YAML is still under development, but any features already included are
+considered stable. Feedback is welcome at d...@apache.beam.org.
+
+## Prerequisites
+
+The Beam YAML parser is currently included as part of the
+[Apache Beam Python SDK](../python/). You don't need to write Python code to 
use
+Beam YAML, but you need the SDK to run pipelines locally.
+
+We recommend creating a
+[virtual 
environment](../../../get-started/quickstart/python/#create-and-activate-a-virtual-environment)
+so that all packages are installed in an isolated and self-contained
+environment. After you set up your Python environment, install the SDK as
+follows:
 
 ```
 pip install apache_beam[yaml,gcp]
 ```
 
-In addition, several of the provided transforms (such as SQL) are implemented
-in Java and their expansion will require a working Java interpeter. (The
-requisite artifacts will be automatically downloaded from the apache maven
-repositories, so no further installs will be required.)
-Docker is also currently required for local execution of these
-cross-language-requiring transforms, but not for submission to a non-local
-runner such as Flink or Dataflow.
+In addition, several of the provided transforms, such as the SQL transform, are
+implemented in Java and require a working Java interpeter. When you a run a
+pipeline with these transforms, the required artifacts are automatically
+downloaded from the Apache Maven repositories. To execute these cross-language
+transforms locally, you must have Docker installed on your local machine.

Review Comment:
   The part about Docker is no longer true since 
https://github.com/apache/beam/pull/29283 . (Are there other places we should 
be updating this as well?)



##
website/www/site/content/en/documentation/sdks/yaml.md:
##
@@ -23,80 +23,132 @@ title: "Apache Beam YAML API"
 
 # Beam YAML API
 
-While Beam provides powerful APIs for authoring sophisticated data

Re: [PR] Create YAML Join Transform [beam]

2024-04-04 Thread via GitHub


itodotimothy6 commented on code in PR #30734:
URL: https://github.com/apache/beam/pull/30734#discussion_r1552520613


##
sdks/python/apache_beam/yaml/yaml_join.py:
##
@@ -0,0 +1,161 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""This module defines the Join operation."""
+import networkx as nx

Review Comment:
   ignore above comment. I removed networkx in latest commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [do not merge] remove --pre for testing purposes [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on PR #30859:
URL: https://github.com/apache/beam/pull/30859#issuecomment-2038220335

   Run Python_Coverage PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Correct the version spec [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30856:
URL: https://github.com/apache/beam/pull/30856#issuecomment-2038219740

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @riteshghorse for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [do not merge] remove --pre for testing purposes [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on PR #30859:
URL: https://github.com/apache/beam/pull/30859#issuecomment-2038219066

   Run Python_Coverage PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2038213168

   The hypothesis plugin might be provided by hypothesis itself, and we might 
need it for tests that use hypothesis. 
   But looks like Pydantic 2 doesn't work with hypothesis: 
https://github.com/pydantic/pydantic/discussions/5979 , and somehow pydantic 
(which i believe we didn't have in our dependency chain before), now  intefers 
with pytest/hypothesis. the fact that old version of pydantic gets installed 
due to `--pre` flag may or may not be a factor. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [do not merge] remove --pre for testing purposes [beam]

2024-04-04 Thread via GitHub


tvalentyn opened a new pull request, #30859:
URL: https://github.com/apache/beam/pull/30859

   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use empty flags for default expansion service options. [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30858:
URL: https://github.com/apache/beam/pull/30858#issuecomment-2038213399

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Use empty flags for default expansion service options. [beam]

2024-04-04 Thread via GitHub


robertwb commented on PR #30858:
URL: https://github.com/apache/beam/pull/30858#issuecomment-2038210512

   R: @Polber 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Use empty flags for default expansion service options. [beam]

2024-04-04 Thread via GitHub


robertwb opened a new pull request, #30858:
URL: https://github.com/apache/beam/pull/30858

   The cli-invoking variant already parses and passes these in manually. 
Internal uses should not automatically pick up `sys.argv`.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Initial Iceberg Sink [beam]

2024-04-04 Thread via GitHub


chamikaramj commented on code in PR #30797:
URL: https://github.com/apache/beam/pull/30797#discussion_r1552407817


##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/AppendFilesToTables.java:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.io.iceberg;
+
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.SerializableCoder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.Snapshot;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.checkerframework.checker.nullness.qual.MonotonicNonNull;
+
+class AppendFilesToTables
+extends PTransform, PCollection>> {
+
+  private final IcebergCatalogConfig catalogConfig;
+
+  AppendFilesToTables(IcebergCatalogConfig catalogConfig) {
+this.catalogConfig = catalogConfig;
+  }
+
+  @Override
+  public PCollection> expand(PCollection 
writtenFiles) {
+
+// Apply any sharded writes and flatten everything for catalog updates
+return writtenFiles
+.apply(
+"Key metadata updates by table",
+WithKeys.of(
+new SerializableFunction() {
+  @Override
+  public String apply(FileWriteResult input) {
+return input.getTableIdentifier().toString();
+  }
+}))
+// .setCoder(KvCoder.of(StringUtf8Coder.of(), new 
MetadataUpdate.MetadataUpdateCoder()))

Review Comment:
   Uncomment or delete.



##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/OneTableDynamicDestinations.java:
##
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.io.iceberg;
+
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.values.Row;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.checkerframework.checker.nullness.qual.MonotonicNonNull;
+
+class OneTableDynamicDestinations implements DynamicDestinations {
+
+  private static final Schema EMPTY_SCHEMA = Schema.builder().build();
+  private static final Row EMPTY_ROW = Row.nullRow(EMPTY_SCHEMA);
+
+  // TableId represented as String for serializability
+  private final String tableIdString;
+
+  private transient @MonotonicNonNull TableIdentifier tableId;
+
+  private TableIdentifier getTableIdentifier() {
+if (tableId == null) {
+  tableId = TableIdentifier.parse(tableIdString);
+}
+return tableId;
+  }
+
+  OneTableDynamicDestinations(TableIdentifier tableId) {
+this.tableIdString = tableId.toString();
+  }
+
+  @Override
+  public Schema getMetadataSchema() {
+return EMPTY_SCHEMA;
+  }
+
+  @Override
+  public Row assignDestinationMetadata(Row data) {
+return EMPTY_ROW;
+  }
+
+  @Override
+  public IcebergDestination 

Re: [PR] Add Redistribute transform to model, Java SDK, and most active runners [beam]

2024-04-04 Thread via GitHub


kennknowles commented on code in PR #30545:
URL: https://github.com/apache/beam/pull/30545#discussion_r1541202528


##
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkBatchTransformTranslators.java:
##
@@ -423,6 +431,76 @@ public void translateNode(
 }
   }
 
+  private static class RedistributeByKeyTranslatorBatch
+  implements FlinkBatchPipelineTranslator.BatchTransformTranslator<
+  Redistribute.RedistributeByKey> {
+
+@Override
+public void translateNode(
+Redistribute.RedistributeByKey transform, 
FlinkBatchTranslationContext context) {
+  final DataSet>> inputDataSet =
+  context.getInputDataSet(context.getInput(transform));
+  // Construct an instance of CoderTypeInformation which contains the 
pipeline options.
+  // This will be used to initialized FileSystems.
+  final CoderTypeInformation>> outputType =
+  ((CoderTypeInformation>>) 
inputDataSet.getType())
+  .withPipelineOptions(context.getPipelineOptions());
+  // We insert a NOOP here to initialize the FileSystems via the above 
CoderTypeInformation.

Review Comment:
   I couldn't find anything. Guess I'll dive into the history of the line this 
is copied from and see if there was something clear. Might be obsolete too.



##
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java:
##
@@ -917,6 +919,41 @@ private  void groupByKeyAndSortValuesHelper(
   }
 });
 
+registerTransformTranslator(
+RedistributeByKey.class,

Review Comment:
   I don't understand the question? Does this help?
   
- `Redistribute.arbitrarily()` has the same composite structure as 
`Reshuffle.viaRandomKey()`
- `Redistribute.byKey()` has the same composite structure as 
`Reshuffle.of()`
- This code here is the Dataflow v1 translation for `Redistribute.byKey()` 
which is simplified from the translation of `Reshuffle.of()`
- The proposal for the future is to do even better for the `arbitrarily` 
case by having Dataflow expose a primitive rather than the existing cludge on 
top of GroupByKey.
   
   This thread didn't mention it but now I realize there's a potential problem 
because the purpose of the override was to save some data shuffled by not 
reifying the timestamps since they are available elsewhere in Dataflow-specific 
shuffle metadata. Now I'm on the fence, because I'd rather not rely on that 
always being the case, as it would be update-incompatible to change it, whereas 
reifying all metadata in a standard way is robust to changes.



##
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java:
##
@@ -301,6 +305,24 @@ private  void translateReshuffle(
 Iterables.getOnlyElement(transform.getOutputsMap().values()), 
inputDataStream.rebalance());
   }
 
+  private  void translateRedistributeByKey(
+  String id, RunnerApi.Pipeline pipeline, StreamingTranslationContext 
context) {
+RunnerApi.PTransform transform = 
pipeline.getComponents().getTransformsOrThrow(id);
+DataStream>> inputDataStream =
+
context.getDataStreamOrThrow(Iterables.getOnlyElement(transform.getInputsMap().values()));
+context.addDataStream(
+Iterables.getOnlyElement(transform.getOutputsMap().values()), 
inputDataStream.rebalance());

Review Comment:
   Yea, we could share the same implementation. I followed my rule that code 
should be shared if it is representing the same thing by logical necessity, 
otherwise not shared. In this case there are two very similar things that are 
temporarily having the same implementation. I don't care too much, could re-use 
the same lines of code for now until we choose to diverge. I have a slight 
preference for keeping them separate to make it obvious that there is no 
logical necessity that they be in sync.



##
runners/flink/flink_runner.gradle:
##
@@ -309,6 +311,8 @@ def createValidatesRunnerTask(Map m) {
 
 // Flink reshuffle override does not preserve all metadata
 excludeTestsMatching 
'org.apache.beam.sdk.transforms.ReshuffleTest.testReshufflePreservesMetadata'
+// Flink redistribute override does not preserve all metadata

Review Comment:
   Yea, I don't know what all is needed. I'll take another look and see if it 
is obvious.



##
runners/samza/src/main/java/org/apache/beam/runners/samza/translation/ReshuffleTranslator.java:
##
@@ -42,6 +42,16 @@
 public class ReshuffleTranslator
 implements TransformTranslator>, 
PCollection>>> {
 
+  private final String prefix;
+
+  ReshuffleTranslator(String prefix) {
+this.prefix = prefix;
+  }
+
+  ReshuffleTranslator() {
+this("rhfl-");

Review Comment:
   It was not! Thank you



##
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Redistribute.java:
##
@@ -0,0 +1,300 @@
+/*
+ * 

Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2038186835

   
https://stackoverflow.com/questions/71394400/how-to-block-the-hypothesis-pytest-plugin
 has some discussion how to disable it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2038183904

   even though pip selects a  bizzare version for pydantic, pydantic 2, this 
pydantic-hypothesis plugin seems broken
   
   `from pydantic import _hypothesis_plugin` fails, the more correct import 
`from pydantic.v1 import _hypothesis_plugin` also fails with recent versions of 
hypthesis



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


damccorm commented on PR #30559:
URL: https://github.com/apache/beam/pull/30559#issuecomment-2038162485

   SGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


damccorm closed pull request #30559: Implement LengthPrefixCoder.to_type_hint
URL: https://github.com/apache/beam/pull/30559


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30857:
URL: https://github.com/apache/beam/pull/30857#issuecomment-2038161468

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


hjtran commented on PR #30559:
URL: https://github.com/apache/beam/pull/30559#issuecomment-2038160409

   I've just created a duplicate PR without the rebase issues
   https://github.com/apache/beam/pull/30857


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] byte-buddy:1.14.12 [beam]

2024-04-04 Thread via GitHub


damccorm merged PR #30746:
URL: https://github.com/apache/beam/pull/30746


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


hjtran commented on PR #30857:
URL: https://github.com/apache/beam/pull/30857#issuecomment-2038159581

   R: @damccorm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


hjtran opened a new pull request, #30857:
URL: https://github.com/apache/beam/pull/30857

   (Duplicate of another [PR](https://github.com/apache/beam/pull/30559) but 
without rebase issues)
   LengthPrefixCoder doesn't define to_type_hint even though it seems pretty 
well defined. I've implemented it by just inferring it from its value coder.
   
   Motivating use case - we have some debug tools that infer the types of 
pcollections based on to_type_hint. The LPCoder is used in a lot of places.
   
   Bonus/unrelated change - I've pulled out the GitHub issue referenced in the 
NotImplementedError raised by to_type_hint. Every time I run into this error, I 
check the github issue and get more confused. It doesn't seem useful to refer 
to it in the error message, but does still seem relevant, so I pulled it out 
into a comment.
   
   Testing Done
   I haven't actually run the new unit test I wrote since I'm struggling to get 
my python sdk dev environment up (I've posted on the dev@ mailing list). Hoping 
the GH Action can just do it for me :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


damccorm commented on PR #30559:
URL: https://github.com/apache/beam/pull/30559#issuecomment-2038146227

   Uh oh, looks like a bad rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2038144463

   < 
- pydantic [required: <3, installed: 2.0a4]
   this seems to be an incorrect installer behavior. 2.0a4 shouln't be 
installed under these constraints.
   
   Actually I misread this, it still fits the range but chose version is 
strange, there might be more constraints.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2038142141

   from pipdeptree:
   
   ```
   * google-cloud-aiplatform==1.46.0
- proto-plus [required: >=1.22.0,<2.0.0dev, installed: 1.24.0.dev0]
- pydantic [required: <3, installed: 2.0a4]
   ```
   
   this seems to be an incorrect installer behavior. 2.0a4 shouln't be 
installed under these constraints.
   
   not sure how that happens.
   
   Likely what triggered the error for us was a recent release in 
https://pypi.org/project/google-cloud-aiplatform/#history , which added the 
pydantic dependency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]

2024-04-04 Thread via GitHub


kennknowles commented on issue #30498:
URL: https://github.com/apache/beam/issues/30498#issuecomment-2038130587

   First step is I want to figure out if it is a Beam bug or a Calcite bug. I 
expect it to be a Beam bug. A likely source of the problem would be something 
like 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
   
   This is a rule that merges aggregation and projects, which seems to be what 
happens here. We have our own version of the rule because Beam has special 
projection pushdown into IOs. Just as an example of how I am thinking about 
debugging this. I will now read that file and report back if I see anything 
obvious.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


hjtran commented on PR #30559:
URL: https://github.com/apache/beam/pull/30559#issuecomment-2038124884

   > I'll merge later after checks pass
   
   Thanks for all the reviews! I really appreciate it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2038121140

   Beam test infra installs pre-released dependencies to detect possible issues 
ahead of releases. The comand: 
   
   ```
pip install --pre "tensorflow_transform>=1.13.0,<1.14.0" 
apache-beam[gcp,test]
   ```
   
   installs pydantic==2.0a4
   
   The command
   
   ```
pip install  "tensorflow_transform>=1.13.0,<1.14.0" apache-beam[gcp,test]
   ```
   
   installs pydantic==1.10.15
   
   The tft requirement comes from:   
https://github.com/apache/beam/blob/21129a41e031c150c3f610639d71a95a3a941243/sdks/python/tox.ini#L316



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


damccorm commented on PR #30559:
URL: https://github.com/apache/beam/pull/30559#issuecomment-2038117336

   I'll merge later after checks pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Testing] Try fixing inference benchmark tests [beam]

2024-04-04 Thread via GitHub


riteshghorse commented on PR #30855:
URL: https://github.com/apache/beam/pull/30855#issuecomment-2038101504

   Run Inference Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Correct the version spec [beam]

2024-04-04 Thread via GitHub


tvalentyn opened a new pull request, #30856:
URL: https://github.com/apache/beam/pull/30856

   It is incorrect to use `<=` in this case, see: 
https://cwiki.apache.org/confluence/display/BEAM/Dependency+management+guidelines+for+Beam+Python+SDK+maintainers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


hjtran commented on code in PR #30559:
URL: https://github.com/apache/beam/pull/30559#discussion_r1552348156


##
local-env-setup.sh:
##
@@ -24,7 +24,7 @@ darwin_install_pip3_packages() {
 
 install_go_packages(){
 echo "Installing goavro"
-go get github.com/linkedin/goavro/v2
+go install github.com/linkedin/goavro@latest

Review Comment:
   No, this was accidental. Had to make this change to get my dev container to 
work. [Mailing list 
thread](https://lists.apache.org/thread/4lp8nqppd2xv8m694oyjzzjqvwtor4gl)
   
   Will revert



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Testing] Try fixing inference benchmark tests [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30855:
URL: https://github.com/apache/beam/pull/30855#issuecomment-2038099474

   Stopping reviewer notifications for this pull request: requested by reviewer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Testing] Try fixing inference benchmark tests [beam]

2024-04-04 Thread via GitHub


riteshghorse commented on PR #30855:
URL: https://github.com/apache/beam/pull/30855#issuecomment-2038097863

   Run Inference Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Testing] Try fixing inference benchmark tests [beam]

2024-04-04 Thread via GitHub


riteshghorse commented on PR #30855:
URL: https://github.com/apache/beam/pull/30855#issuecomment-2038096484

   stop reviewer notifications


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement LengthPrefixCoder.to_type_hint [beam]

2024-04-04 Thread via GitHub


damccorm commented on code in PR #30559:
URL: https://github.com/apache/beam/pull/30559#discussion_r1552338484


##
local-env-setup.sh:
##
@@ -24,7 +24,7 @@ darwin_install_pip3_packages() {
 
 install_go_packages(){
 echo "Installing goavro"
-go get github.com/linkedin/goavro/v2
+go install github.com/linkedin/goavro@latest

Review Comment:
   Did you mean to make this change? I think we need get here not just install



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Testing] Try fixing inference benchmark tests [beam]

2024-04-04 Thread via GitHub


riteshghorse commented on PR #30855:
URL: https://github.com/apache/beam/pull/30855#issuecomment-2038084704

   Run Inference Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [Testing] Try fixing inference benchmark tests [beam]

2024-04-04 Thread via GitHub


riteshghorse opened a new pull request, #30855:
URL: https://github.com/apache/beam/pull/30855

   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Testing] Try fixing inference benchmark tests [beam]

2024-04-04 Thread via GitHub


riteshghorse commented on PR #30855:
URL: https://github.com/apache/beam/pull/30855#issuecomment-2038055456

   Run Inference Benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


Abacn commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2038050053

   #30853 appears working. Basically it sets a timeout after that the 
checkpoint is forced to be finalized. This is unsafe in general, but it indeed 
released the records stuck in the session internal buffer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Feature Request]: Support for Apache Flink 1.18 in Beam Runners [beam]

2024-04-04 Thread via GitHub


thebozzcl commented on issue #30789:
URL: https://github.com/apache/beam/issues/30789#issuecomment-2038042064

   Also, please note that the codebase in #30197 has diverged from the main 
branch quite a bit, so I'm not even sure my changes will apply once that's 
rebased.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-04 Thread via GitHub


chamikaramj commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1552304475


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.Map;
+import java.util.regex.Pattern;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+
+public class Managed {
+  public static Read read() {

Review Comment:
   `p.apply(Managed.read(ManagedIO.ICEBERG).withConfig(config))`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Feature Request]: Support for Apache Flink 1.18 in Beam Runners [beam]

2024-04-04 Thread via GitHub


thebozzcl commented on issue #30789:
URL: https://github.com/apache/beam/issues/30789#issuecomment-2038040661

   I forked from that PR to add support for 1.18. It's a trivial change, too: 
https://github.com/thebozzcl/beam/commit/432643bfd70f1b8a6052c33a04774b93da5a7bae
   
   So far this is working fine in our tests, with a caveat - in its current 
status, there's some incompatibilities with the publicly released libraries 
(mostly around `SerializablePipelineOptions`). For our tests, I had to use 
custom-built versions of a few libraries based on the repo:
   * beam-runners-core-java
   * beam-runners-java-fn-execution
   * beam-runners-java-job-service
   * beam-sdks-java-core
   
   I'm hopeful that once the PR for 1.17 is merged, adding support for 1.18 
will be trivial. We're going to try using these libraries in one of our more 
complex pipelines, see if everything still works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [Failing Test]: Running `which java` seems to cause flakes in some YAML tests on GHA runners [beam]

2024-04-04 Thread via GitHub


tvalentyn opened a new issue, #30854:
URL: https://github.com/apache/beam/issues/30854

   ### What happened?
   
   In stacktrace below, process seems to got stuck when running 
subprocess.run(['which', 'java']. Filing to track this issue if it is common.
   
   cc: @Polber 
   
   ```
    AggregationTest.test_combine_mean_minimal_yaml 

   [gw1] linux -- Python 3.9.19 
/runner/_work/beam/beam/sdks/python/test-suites/tox/py39/build/srcs/sdks/python/target/.tox-py39/py39/bin/python
   
   self = 
   
   @mock.patch('apache_beam.Pipeline', TestPipeline)
   def test_yaml_example(self):
 with open(pipeline_spec_file, encoding="utf-8") as f:
   lines = f.readlines()
 expected_key = '# Expected:\n'
 if expected_key in lines:
   expected = lines[lines.index('# Expected:\n') + 1:]
 else:
   raise ValueError(
   f"Missing '# Expected:' tag in example file 
'{pipeline_spec_file}'")
 for i, line in enumerate(expected):
   expected[i] = line.replace('#  ', '').replace('\n', '')
 pipeline_spec = yaml.load(
 ''.join(lines), Loader=yaml_transform.SafeLineLoader)
   
 with TestEnvironment() as env:
   if custom_preprocessor:
 pipeline_spec = custom_preprocessor(pipeline_spec, expected, env)
   with beam.Pipeline(options=PipelineOptions(
   pickle_library='cloudpickle',
   **yaml_transform.SafeLineLoader.strip_metadata(pipeline_spec.get(
   'options', {} as p:
   > actual = yaml_transform.expand_pipeline(p, pipeline_spec)
   
   apache_beam/yaml/examples/examples_test.py:77: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ 
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:1035:
 in expand_pipeline
   return YamlTransform(
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:1006:
 in expand
   result = expand_transform(
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:455:
 in expand_transform
   return expand_composite_transform(spec, scope)
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:529:
 in expand_composite_transform
   return CompositePTransform.expand(None)
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:520:
 in expand
   inner_scope.compute_all()
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:247:
 in compute_all
   self.compute_outputs(transform_id)
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:96:
 in wrapper
   self._cache[key] = func(self, *args)
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:283:
 in compute_outputs
   return expand_transform(self._transforms_by_uuid[transform_id], self)
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:457:
 in expand_transform
   return expand_leaf_transform(spec, scope)
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:479:
 in expand_leaf_transform
   ptransform = scope.create_ptransform(spec, inputs_dict.values())
   
target/.tox-py39/py39/lib/python3.9/site-packages/apache_beam/yaml/yaml_transform.py:373:
 in create_ptransform
   provider = self.best_provider(spec, input_providers)
   else:
   env_list = None  # Use execv instead of execve.
   executable = os.fsencode(executable)
   if os.path.dirname(executable):
   executable_list = (executable,)
   else:
   # This matches the behavior of os._execvpe().
   executable_list = tuple(
   os.path.join(os.fsencode(dir), executable)
   for dir in os.get_exec_path(env))
   fds_to_keep = set(pass_fds)
   fds_to_keep.add(errpipe_write)
   self.pid = _posixsubprocess.fork_exec(
   args, executable_list,
   close_fds, tuple(sorted(map(int, fds_to_keep))),
   cwd, env_list,
   p2cread, p2cwrite, c2pread, c2pwrite,
   errread, errwrite,
   errpipe_read, errpipe_write,
   restore_signals, start_new_session,
   gid, gids, uid, umask,
   preexec_fn)
   self._child_created = True
   finally:
   # be sure the FD is closed no matter what
   os.close(errpipe_write)
   
   self._close_pipe_fds(p2cread, 

Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2037998636

   Not sure how pydantic comes in the picture here yet, but for  
'pydantic._hypothesis_plugin' to be available, it seems that we need to have 
1.0.0

Re: [PR] Implementing lull reporting at bundle level processing [beam]

2024-04-04 Thread via GitHub


dustin12 commented on code in PR #30693:
URL: https://github.com/apache/beam/pull/30693#discussion_r1552236808


##
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateTracker.java:
##
@@ -335,6 +352,19 @@ protected void takeSampleOnce(long millisSinceLastSample) {
   transitionsAtLastSample = transitionsAtThisSample;
 }
 updateMillisSinceLastTransition(millisSinceLastSample, state);
+updateMillisSinceBundleStart(millisSinceLastSample);
+  }
+
+  // Override this to implement bundle level lull reporting.
+  protected void reportBundleLull(Thread trackedThread, long 
millisSinceBundleStart) {}
+
+  @SuppressWarnings("NonAtomicVolatileUpdate")
+  private void updateMillisSinceBundleStart(long millisSinceLastSample) {
+millisSinceBundleStart += millisSinceLastSample;
+if (millisSinceBundleStart > nextBundleLullReportMs) {
+  reportBundleLull(trackedThread, millisSinceBundleStart);
+  nextBundleLullReportMs += BUNDLE_LULL_REPORT_MS;
+}
   }
 

Review Comment:
   maybe a BEAM/Java developer can help here.  Is there a reason we supress 
these warnings?  Is it safe to do for some reason?  It seems to indicate a 
possible data race which we should fix?



##
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StackTraceUtil.java:
##
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker;
+
+import org.apache.beam.runners.core.SimpleDoFnRunner;
+import org.apache.beam.sdk.annotations.Internal;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableSet;
+
+/** Utility methods to print the stack traces of all the threads. */
+@Internal
+public final class StackTraceUtil {
+  private static final ImmutableSet FRAMEWORK_CLASSES =
+  ImmutableSet.of(SimpleDoFnRunner.class.getName(), 
DoFnInstanceManagers.class.getName());
+
+  public static String getStackTraceForLullMessage(StackTraceElement[] 
stackTrace) {
+StringBuilder message = new StringBuilder();
+for (StackTraceElement e : stackTrace) {
+  if (FRAMEWORK_CLASSES.contains(e.getClassName())) {
+break;

Review Comment:
   where is this code from?  It doesn't seem to have moved from another file 
and I'm not sure why it sbeing added.  I'm also a bit confused as to why we 
would stop a stacktrace as soon as we got to a Framework class, it doesn't seem 
related to bundle lulls.



##
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/ExecutionStateTracker.java:
##
@@ -139,8 +140,17 @@ public String getDescription() {
*/
   private volatile long millisSinceLastTransition = 0;
 
+  /**
+   * The number of milliseconds since the {@link ExecutionStateTracker} 
initial state.
+   *
+   * This variable is updated by the Sampling thread, and read by the 
Progress Reporting thread,
+   * thus it being marked volatile.
+   */
+  private volatile long millisSinceBundleStart = 0;

Review Comment:
   why is this marked as volatile?  And thus needs the supress warning on line 
361.



##
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/DataflowExecutionContext.java:
##
@@ -310,12 +350,64 @@ public Closeable activate() {
   }
 }
 
+private String getBundleLullMessage(Thread trackedThread, Duration 
lullDuration) {
+  StringBuilder message = new StringBuilder();
+  message
+  .append("Operation ongoing in bundle for at least ")
+  .append(DURATION_FORMATTER.print(lullDuration.toPeriod()))
+  .append(" without completing")
+  .append("\n");
+  synchronized (this) {
+if (this.activeMessageMetadata != null) {
+  message.append(
+  "Current user step name: " + 
getActiveMessageMetadata().get().userStepName() + "\n");
+  message.append(
+  "Time spent in this step(millis): "
+  + (clock.currentTimeMillis() - 
getActiveMessageMetadata().get().startTime())
+  + "\n");
+}
+message.append("Processing times in each 

[PR] [Test only] Introduce checkpointtimeout [beam]

2024-04-04 Thread via GitHub


Abacn opened a new pull request, #30853:
URL: https://github.com/apache/beam/pull/30853

   Test only, attempt to fix 
https://github.com/apache/beam/pull/30218#issuecomment-2037429676
   
   However, found that if the checkpoint can expire, the messages unacked will 
get redelivered, but the new checkpoint won't finalize either, causing 
repetitive redeliver and duplicate messages.
   
   I now think the underlying issue is that finalizing checkpoint is not 
guaranteed to happen timely. This is work as expected. Expiring outstanding 
checkpoint won't resolve the stucked messages when throughput is low given the 
asynchronous ack mechanism.
   
   Thus, another option is to expose "Auto acknowledge" client option to JmsIO 
to retain the previous behavior, however, data loss is possible, as in previous 
implementation 
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] disable javascript mapping for python >=3.12 [beam]

2024-04-04 Thread via GitHub


Polber commented on code in PR #30843:
URL: https://github.com/apache/beam/pull/30843#discussion_r1552245985


##
sdks/python/setup.py:
##
@@ -368,7 +368,7 @@ def get_portability_package_data():
   'grpcio>=1.33.1,!=1.48.0,<2',
   'hdfs>=2.1.0,<3.0.0',
   'httplib2>=0.8,<0.23.0',
-  'js2py>=0.74,<1',
+  'js2py>=0.74,<1; python_version<"3.12"',

Review Comment:
   Oh sorry I totally misunderstood what you were asking, thanks for adding 
that!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] ignore -- end of options [beam]

2024-04-04 Thread via GitHub


Polber closed pull request #30845: ignore -- end of options
URL: https://github.com/apache/beam/pull/30845


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-04 Thread via GitHub


robertwb commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1552206457


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.Map;
+import java.util.regex.Pattern;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+
+public class Managed {
+  public static Read read() {

Review Comment:
   Yes, I'm totally fine with a static list for the initial release. The key 
point is getting the API stable and generating the right protos. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update python transform catalog [beam]

2024-04-04 Thread via GitHub


damccorm commented on code in PR #30788:
URL: https://github.com/apache/beam/pull/30788#discussion_r1552164898


##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/batchelements_test.py:
##
@@ -0,0 +1,49 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import unittest
+
+import mock
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that, equal_to
+
+from . import batchelements
+
+
+def check_batches(actual):
+  expected = [['', '凌', ''], ['', '凌', ''], ['', '凌', '', '']]

Review Comment:
   I'm a little bit skeptical of this test since the batches produced here are 
an implementation detail of how this is broken into bundles.
   
   Could we maybe instead check:
   
   1) That this does batching at all (returns lists, not strings)
   2) doesn't drop elements (if we pull them out of lists, and then count by 
element they return the correct counts)



##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/tolist_test.py:
##
@@ -0,0 +1,47 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import mock
+import unittest
+
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that, equal_to
+
+from . import tolist
+
+
+def identity(x):
+  return x
+
+
+@mock.patch('apache_beam.Pipeline', TestPipeline)
+# pylint: disable=line-too-long
+@mock.patch(
+'apache_beam.examples.snippets.transforms.aggregation.tolist.print',
+identity)
+# pylint: enable=line-too-long
+class BatchElementsTest(unittest.TestCase):
+  def test_tolist(self):
+def check(result):
+  assert_that(result, equal_to([['', '凌', '', '']]))

Review Comment:
   Similarly, ordering isn't guaranteed here; could we try to test that it is a 
list and it contains all the correct elements?



##
website/www/site/content/en/documentation/transforms/python/overview.md:
##
@@ -48,13 +48,13 @@ limitations under the License.
 
 
   TransformDescription
-  ApproximateQuantilesNot available. See https://issues.apache.org/jira/browse/BEAM-6694;>BEAM-6694 for 
updates.
-  ApproximateUniqueNot available. See https://issues.apache.org/jira/browse/BEAM-6693;>BEAM-6693 for 
updates.
+  ApproximateQuantilesGiven
 a distribution, find the approximate N-tiles.
+  ApproximateUniqueGiven
 a pcollection, return the estimated number of unique elements.
+  BatchElementsGiven
 a pcollection, return the estimated number of unique elements.

Review Comment:
   Thanks for catching these. Could we (a) also add ToList and (b) add these 
transforms to 
https://github.com/apache/beam/blob/21129a41e031c150c3f610639d71a95a3a941243/website/www/site/layouts/partials/section-menu/en/documentation.html#L324



##
sdks/python/apache_beam/examples/snippets/transforms/aggregation/batchelements_test.py:
##
@@ -0,0 +1,49 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in 

Re: [PR] Initial Iceberg Sink [beam]

2024-04-04 Thread via GitHub


kennknowles commented on PR #30797:
URL: https://github.com/apache/beam/pull/30797#issuecomment-2037849642

   > OK I have done a whole massive revision and tested it a little bit more.
   > 
   > The only piece that I have not revised is the `IcebergCatalogConfig` which 
gets turned into an `org.apache.iceberg.catalog.Catalog` on the client and each 
worker separately. I think your suggestion was to try to use just a big 
key-value map for all the config values. I am fine with that. I don't really 
know enough about it yet. All my deep dives into iceberg Java libraries was for 
other pieces.
   
   It looks like this might work: 
https://github.com/tabular-io/iceberg-kafka-connect/blob/5ab5c538efab9ccf3cde166f36ba34189eed7187/kafka-connect/src/main/java/io/tabular/iceberg/connect/IcebergSinkConfig.java#L256


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Initial Iceberg Sink [beam]

2024-04-04 Thread via GitHub


kennknowles commented on PR #30797:
URL: https://github.com/apache/beam/pull/30797#issuecomment-2037846183

   OK I have done a whole massive revision and tested it a little bit more.
   
   The only piece that I have not revised is the `IcebergCatalogConfig` which 
gets turned into an `org.apache.iceberg.catalog.Catalog` on the client and each 
worker separately. I think your suggestion was to try to use just a big 
key-value map for all the config values. I am fine with that. I don't really 
know enough about it yet. All my deep dives into iceberg Java libraries was for 
other pieces.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Initial Iceberg Sink [beam]

2024-04-04 Thread via GitHub


kennknowles commented on code in PR #30797:
URL: https://github.com/apache/beam/pull/30797#discussion_r1552179077


##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/WriteToDestinations.java:
##
@@ -0,0 +1,242 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.io.iceberg;
+
+import java.util.Collections;
+import java.util.UUID;
+import org.apache.beam.io.iceberg.WriteBundlesToFiles.Result;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.IterableCoder;
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.SerializableCoder;
+import org.apache.beam.sdk.coders.ShardedKeyCoder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Flatten;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SimpleFunction;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.apache.beam.sdk.values.ShardedKey;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TupleTagList;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.Snapshot;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.TableIdentifier;
+
+class WriteToDestinations
+extends PTransform<
+PCollection>, 
IcebergWriteResult> {
+
+  @VisibleForTesting static final int DEFAULT_MAX_WRITERS_PER_BUNDLE = 20;
+  @VisibleForTesting static final int DEFAULT_MAX_FILES_PER_PARTITION = 10_000;
+  @VisibleForTesting static final long DEFAULT_MAX_BYTES_PER_PARTITION = 10L * 
(1L << 40); // 10TB
+  static final long DEFAULT_MAX_BYTES_PER_FILE = (1L << 40); // 1TB
+  static final int DEFAULT_NUM_FILE_SHARDS = 0;
+  static final int FILE_TRIGGERING_RECORD_COUNT = 50_000;
+
+  private final Coder destinationCoder;
+
+  private final RecordWriterFactory 
recordWriterFactory;
+  private final TableFactory tableFactory;
+
+  WriteToDestinations(
+  Coder destinationCoder,
+  RecordWriterFactory recordWriterFactory,
+  TableFactory tableFactory) {
+this.destinationCoder = destinationCoder;
+this.recordWriterFactory = recordWriterFactory;
+this.tableFactory = tableFactory;
+  }
+
+  private PCollectionView createJobIdPrefixView(Pipeline p) {
+
+final String jobName = p.getOptions().getJobName();
+
+return p.apply("JobIdCreationRoot_", Create.of((Void) null))
+.apply(
+"CreateJobId",
+ParDo.of(
+new DoFn() {
+  @ProcessElement
+  public void process(ProcessContext c) {
+c.output(jobName + "-" + UUID.randomUUID().toString());
+  }
+}))
+.apply("JobIdSideInput", View.asSingleton());
+  }
+
+  @Override
+  public IcebergWriteResult expand(
+  PCollection> input) {
+
+final PCollectionView fileView = 
createJobIdPrefixView(input.getPipeline());
+// We always do the equivalent of a dynamically sharded file creation
+TupleTag> writtenFilesTag = new 
TupleTag<>("writtenFiles");
+TupleTag, ElementT>> successfulWritesTag =
+new TupleTag<>("successfulWrites");
+TupleTag, ElementT>> failedWritesTag =
+new TupleTag<>("failedWrites");
+TupleTag> snapshotsTag = new 
TupleTag<>("snapshots");
+
+final Coder elementCoder =
+((KvCoder) 

Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-04 Thread via GitHub


robertwb commented on code in PR #30771:
URL: https://github.com/apache/beam/pull/30771#discussion_r1552178049


##
sdks/python/apache_beam/testing/util.py:
##
@@ -261,6 +261,19 @@ def assert_that(
   """
   assert isinstance(actual, pvalue.PCollection), (
   '%s is not a supported type for Beam assert' % type(actual))
+  pipeline = actual.pipeline
+  if getattr(pipeline, 'result', None) is not None:
+# The pipeline was already run. The user most likely called assert_that
+# after the pipeleline context.
+raise RuntimeError(
+'assert_that must be used within a beam.Pipeline context')
+
+  # If label is already in use, just append a number to it.

Review Comment:
   Maybe add something about not needing stability of names across pipeline 
update in a testing context?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Initial Iceberg Sink [beam]

2024-04-04 Thread via GitHub


codecov-commenter commented on PR #30797:
URL: https://github.com/apache/beam/pull/30797#issuecomment-2037829820

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30797?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 0.00%. Comparing base 
[(`61eee6d`)](https://app.codecov.io/gh/apache/beam/commit/61eee6dd672800ed88bd0851a235e7b13ee10847?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`8faea3f`)](https://app.codecov.io/gh/apache/beam/pull/30797?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 1 commits behind head on master.
   
   > :exclamation: Current head 8faea3f differs from pull request most recent 
head 5af12aa. Consider uploading reports for the commit 5af12aa to get more 
accurate results
   
   
   Additional details and impacted files
   
   
   ```diff
   @@  Coverage Diff  @@
   ## master   #30797   +/-   ##
   =
   - Coverage 70.74%0   -70.75% 
   =
 Files  12560 -1256 
 Lines1407690   -140769 
 Branches   43070 -4307 
   =
   - Hits  995920-99592 
   + Misses377000-37700 
   + Partials   34770 -3477 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30797/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[go](https://app.codecov.io/gh/apache/beam/pull/30797/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `?` | |
   | 
[java](https://app.codecov.io/gh/apache/beam/pull/30797/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `?` | |
   | 
[python](https://app.codecov.io/gh/apache/beam/pull/30797/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30797?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn opened a new issue, #30852:
URL: https://github.com/apache/beam/issues/30852

   ### What happened?
   
   The 'coverage' suite runs some Beam unit tests in environments with 
different versions a particular dependency, for example we test severalversions 
of pyarrow or tft. The py38-tft-113 suite currently fails, likely due to a 
incompatible dependencies in  tox environment:

   ```
   = test session starts 
==
   Plugin: terminalreporter, Hook: pytest_sessionfinish
   ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin'
   For more information see 
https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
 config.hook.pytest_sessionfinish(
   Traceback (most recent call last):
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/bin/pytest",
 line 10, in 
   sys.exit(console_main())
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/_pytest/config/__init__.py",
 line 192, in console_main
   code = main()
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/_pytest/config/__init__.py",
 line 169, in main
   ret: Union[ExitCode, int] = config.hook.pytest_cmdline_main(
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_hooks.py",
 line 501, in __call__
   return self._hookexec(self.name, self._hookimpls.copy(), kwargs, 
firstresult)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_manager.py",
 line 119, in _hookexec
   return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_callers.py",
 line 138, in _multicall
   raise exception.with_traceback(exception.__traceback__)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_callers.py",
 line 102, in _multicall
   res = hook_impl.function(*args)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/_pytest/main.py",
 line 318, in pytest_cmdline_main
   return wrap_session(config, _main)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/_pytest/main.py",
 line 306, in wrap_session
   config.hook.pytest_sessionfinish(
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_hooks.py",
 line 501, in __call__
   return self._hookexec(self.name, self._hookimpls.copy(), kwargs, 
firstresult)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_manager.py",
 line 119, in _hookexec
   return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_callers.py",
 line 155, in _multicall
   teardown[0].send(outcome)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/_pytest/terminal.py",
 line 867, in pytest_sessionfinish
   self.config.hook.pytest_terminal_summary(
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_hooks.py",
 line 501, in __call__
   return self._hookexec(self.name, self._hookimpls.copy(), kwargs, 
firstresult)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_manager.py",
 line 119, in _hookexec
   return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 File 
"/runner/_work/beam/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/target/.tox-py38-tft-113/py38-tft-113/lib/python3.8/site-packages/pluggy/_callers.py",
 line 181, in _multicall
   return outcome.get_result()
 File 

Re: [I] [Failing Test]: beam_PreCommit_Python_Coverage suite fails with ModuleNotFoundError: No module named 'pydantic._hypothesis_plugin' [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on issue #30852:
URL: https://github.com/apache/beam/issues/30852#issuecomment-2037827468

   Sample error: 
https://github.com/apache/beam/actions/runs/8548288262/job/23421776246?pr=30843


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on PR #30771:
URL: https://github.com/apache/beam/pull/30771#issuecomment-2037828423

   I think that error is unrelated to this change, I see it on other PRs as 
well. Filed: https://github.com/apache/beam/issues/30852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow lazy iteration for non-reiterables. [beam]

2024-04-04 Thread via GitHub


github-actions[bot] commented on PR #30851:
URL: https://github.com/apache/beam/pull/30851#issuecomment-2037799521

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


Abacn commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2037783134

   Yeah, it seems authors of this IO connector were aware of prefetch in 
implementation affects how the message gets delivered.
   
   Here I hava a proposal,
   
   introduce a "checkPointTimeout" option for JmsIO.read, default to 0 (never 
timeout), and if works as
   
   when the last advance return true and current advance return false passed 
the checkPointTimeout, then we revoke the lastly made jmscheckpoint, that is 
close the session without acknowledge messages in it
   
   This will cause some duplicates, but it may unblock the mesdages sit in the 
internal buffer of the previous session


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


ppawel commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2037745694

   > We have to recreate the consumer/session to properly handle checkpoints, 
otherwise there were data losses. This is due to the limitation that message 
acknowledge in Jms is per session. All messages that delivered by the time of 
acknowledging a message within a session will be marked as acknowledged.
   
   I've never looked so deep into JMS spec, to be honest it is a bit strange if 
it in fact works like this with regards to acks (that it's enough to ack one 
single message). I think this could also be subject to testing between 
implementations, I can check it at some point with my case (Solace broker) but 
first need to deal somehow with this deadlock situation.
   
   > Agree with the analysis and I think the cause is same here: 
https://github.com/apache/beam/pull/30218/files#diff-a63812b51f93708cc60430f314b496ae1110425c6a8ae4c85e59573cfb8f0938R204-R207
   
   OK but as I understand, this is only for the direct runner, in a runner like 
Dataflow, finalizing the checkpoint might or might not happen at some point and 
"fix" the deadlock but the root cause will still be there.
   
   > Is there a way to release the messages in internal buffer but do not 
acknowledge the receive message within the same session?
   
   I don't think it's possible at JMS API level to do this, and even at the 
implementation level I don't see any easy access to those internal 
queues/buffers in both clients. There are some parameters to control how big is 
the buffer etc. but I think JmsIO should ideally work regardless of such 
parameters. After all, those buffers are there for a reason (performance).
   
   I see the comment in 
`org.apache.beam.sdk.io.jms.JmsIOTest#testCheckpointMark` regarding testing 
without "prefetch" - I think it would be good to also test some scenarios 
including prefetch and this internal buffer. I can't promise anything due to 
time constraints but I might try to implement a test case that reproduces my 
issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] disable javascript mapping for python >=3.12 [beam]

2024-04-04 Thread via GitHub


tvalentyn commented on code in PR #30843:
URL: https://github.com/apache/beam/pull/30843#discussion_r1552087003


##
sdks/python/setup.py:
##
@@ -368,7 +368,7 @@ def get_portability_package_data():
   'grpcio>=1.33.1,!=1.48.0,<2',
   'hdfs>=2.1.0,<3.0.0',
   'httplib2>=0.8,<0.23.0',
-  'js2py>=0.74,<1',
+  'js2py>=0.74,<1; python_version<"3.12"',

Review Comment:
   ```suggestion
 # https://github.com/PiotrDabkowski/Js2Py/issues/317
 'js2py>=0.74,<1; python_version<"3.12"',
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


Abacn commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2037690938

   We have to recreate the consumer/session to properly handle checkpoints, 
otherwise there were data losses. This is due to the limitation that message 
acknowledge in Jms is per session. All messages that delivered by the time of 
acknowledging a message within a session will be marked as acknowledged.
   
   Agree with the analysis and I think the cause is same here: 
https://github.com/apache/beam/pull/30218/files#diff-a63812b51f93708cc60430f314b496ae1110425c6a8ae4c85e59573cfb8f0938R204-R207
   
   The yet-finalized checkpoint hold an active session which may contain 
messages in its internal buffer. Those message won't get released until a 
checkpoint being finalized.
   
   I tested my PR with IBM MQ and it had no issue. So this is still 
implementation related, though.
   
   Is there a way to release the messages in internal buffer but do not 
acknowledge the receive message within the same session?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-04 Thread via GitHub


chamikaramj commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1552055943


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.Map;
+import java.util.regex.Pattern;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+
+public class Managed {
+  public static Read read() {
+return new 
AutoValue_Managed_Read.Builder().setPattern(Read.PATTERN).build();
+  }
+
+  @AutoValue
+  public abstract static class Read extends SchemaTransform {
+protected static final Pattern PATTERN =

Review Comment:
   Sounds good. We can explicitly list instead of trying to do a pattern 
matching.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-04 Thread via GitHub


chamikaramj commented on code in PR #30808:
URL: https://github.com/apache/beam/pull/30808#discussion_r1552052753


##
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java:
##
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.managed;
+
+import com.google.auto.value.AutoValue;
+import java.util.Map;
+import java.util.regex.Pattern;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.schemas.transforms.SchemaTransform;
+import org.apache.beam.sdk.values.PCollectionRowTuple;
+
+public class Managed {
+  public static Read read() {

Review Comment:
   Robert, are you OK with keeping this as a static list for the initial PR and 
making it dynamic (may be via codegen) in the future ?
   
   I think a dynamic will list only be helpful if we can drop new jars with 
SchemaTransforms that we would like to manage to an older Beam version. At 
least initially, we will be very aware about the set of transforms we support 
(and the list will be small).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


ppawel commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2037595552

   @Abacn I am using GCP Dataflow but the test case is with DirectRunner.
   
   I am now digging deeper into this and this is easily reproducible when you 
force the data source splits to 1 in `UnboundedJmsSource#split`. This means 
that you will have only one connection/session/consumer and there will 
basically be a deadlock between the first (original) consumer and the one 
created when doing a checkpoint.
   
   I temporarily removed the code that creates the new consumer and also 
removed closing the consumer and session from `JmsCheckpointMark` and now it 
works fine as expected. So I think this is most likely related to 
closing/opening the session/consumer.
   
   > In the case of 50 outstanding messages, are these messages already 
acknowledged in the broker side (which means data loss)? Or there are still 
unacknowledged but do not send to other session by the broker ?
   
   They are marked as "not acknowledged" by the broker but as "received" by one 
consumer. The other consumer does not get those messages.
   
   I guess the broker would send those messages to the second consumer when the 
first one is closed but closing only happens when finalizing the checkpoint and 
that doesn't happen because only the new consumer (which does not have any 
messages) is being called in `advance` so in the end nothing is being done, it 
just spins around calling `receive`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Allow lazy iteration for non-reiterables. [beam]

2024-04-04 Thread via GitHub


robertwb opened a new pull request, #30851:
URL: https://github.com/apache/beam/pull/30851

   In particular Runner v2 does not produce Reiterables, which resulted in the 
entire stream being read into memory. In this case we can leverage the fact 
that the first 100MB will be cached and quick to reiterate over.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add terraform for utility cluster. Add name override to gke [beam]

2024-04-04 Thread via GitHub


damondouglas commented on code in PR #30847:
URL: https://github.com/apache/beam/pull/30847#discussion_r1551939749


##
.test-infra/terraform/google-cloud-platform/google-kubernetes-engine/cluster.tf:
##
@@ -34,7 +34,9 @@ resource "google_container_cluster" "default" {
 enable_private_nodes= true
 enable_private_endpoint = false
   }
-  node_config {
-service_account = data.google_service_account.default.email
+  cluster_autoscaling {

Review Comment:
   Thank you for changing this! Could we also add the oauth scope for Google 
cloud platform? https://www.googleapis.com/auth/cloud-platform



##
.test-infra/terraform/google-cloud-platform/google-kubernetes-engine/outputs.tf:
##
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+output kubernetes_api_endpoint {
+  value = google_container_cluster.default.endpoint
+}
+
+output cluster_ca_certificate {

Review Comment:
   Thank you for adding outputs :-). Could you tell me what this output is 
needed for?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


Abacn commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2037519755

   Also, which runner are you using? I may noticed this issue and noted in 
https://github.com/apache/beam/pull/30218/files#diff-a63812b51f93708cc60430f314b496ae1110425c6a8ae4c85e59573cfb8f0938R204


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


Abacn commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2037494326

   Hi @ppawel thanks for reporting this.
   
   In the case of 50 outstanding messages, are these messages already 
acknowledged in the broker side (which means data loss)? Or there are still 
unacknowledged but do not send to other session by the broker ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [Bug]: ClassCastException with unknown fields in STORAGE_WRITE_API [beam]

2024-04-04 Thread via GitHub


lucasnoetzold opened a new issue, #30850:
URL: https://github.com/apache/beam/issues/30850

   ### What happened?
   
   We're gettig a `ClassCastException` when trying to write data to a BigQuery 
table with the STORAGE_WRITE_API. This only happens with unknown fields on the 
payload, both `ignoreUnknownValues` and `withAutoSchemaUpdate` options are 
enabled and the field being written is a REPEATED RECORD.
   
   **Conditions:**
   - BigQuery table that contains a "RECORD" type field
   - the record field must be on mode "repeated"
   - `.ignoreUnknownValues()` option enabled
   - `.withAutoSchemaUpdate(true)` option enabled
   - payload that contains any unknown field inside the RECORD field that does 
exist
   
   **Expected Behavior:** unknown fields are ignored and the known payload is 
sent to BigQuery.
   
   ---
   
   I've managed to reproduce the error with this minimal code:
   
   ```java
   var nested = List.of(Map.of("nested_field_that_exists", "something"));
   var row = new TableRow().set("root_field", nested);
   
   var pipeline = Pipeline.create();
   
   pipeline.apply(Create.of(row))
.apply(BigQueryIO.write()
.ignoreUnknownValues()
.withAutoSchemaUpdate(true)
.withFormatFunction(identity())
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
.to(new TableReference()
.setProjectId("the_project_id")
.setDatasetId("the_dataset_id")
.setTableId("test_table")));
   
   pipeline.run();
   ```
   
   The BigQuery table schema used is the following:
   
![image](https://github.com/apache/beam/assets/27974653/491bb62f-a7d7-462f-86ee-1468a14e8220)
   
   Looking at the stack we can see that it happens when it's handling the 
unknown fields. Since the unknown field is nested, and the field descriptor is 
inherited from the root field, it expects it to be a list (since the root field 
is at mode REPEATED):
   
![image](https://github.com/apache/beam/assets/27974653/dd0fcd24-2fe8-4760-bede-5258d7ef037e)
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Jms drop record [beam]

2024-04-04 Thread via GitHub


ppawel commented on PR #30218:
URL: https://github.com/apache/beam/pull/30218#issuecomment-2037429676

   @Abacn I just upgraded to Beam 2.55.0 in my project and my streaming 
pipeline that uses JmsIO gets stuck during consuming messages, it was working 
fine with Beam 2.53.0 and earlier. I tracked this down to changes in this PR 
related to managing JMS resources (consumers, sessions etc.)
   
   I have a test case in my project that simply publishes around 60 messages to 
a queue and then a Beam pipeline is executed to consume and process those 
messages. This test case was passing before but now it works like this:
   
   1. 60 messages are published to a queue in Solace (message broker we use).
   2. Pipeline is started, first consumer is created in JmsIO and it fetches 
all 60 messages into the internal buffer of the consumer.
   3. Beam pipeline consumes 10 messages (`advance` is called 10 times in 
JmsIO).
   4. 10 messages are acked, checkpoint is made and `recreateSession` is called 
in JmsIO (this is the new code introduced in this PR).
   5. New consumer is created, old one is closed.
   6. New consumer does not get the messages anymore - the broker seems to 
think that the first consumer is just late with acknowledging the remaining 50 
messages.
   7. Pipeline hangs forever, `advance` is called all the time but brings no 
new messages. 50 messages remain sitting in the queue not consumed.
   
   Does this scenario make sense or do I miss something in how JmsIO should 
work?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DO NOT MERGE] Flink 1.17 [beam]

2024-04-04 Thread via GitHub


Abacn commented on PR #30197:
URL: https://github.com/apache/beam/pull/30197#issuecomment-2037426140

   Thanks, that's awesome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Feature Request]: Support for Apache Flink 1.18 in Beam Runners [beam]

2024-04-04 Thread via GitHub


Abacn commented on issue #30789:
URL: https://github.com/apache/beam/issues/30789#issuecomment-2037424496

   As of timeline, as an open source project the Flink runner component has 
been community driven. Any contribution is welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DO NOT MERGE] Flink 1.17 [beam]

2024-04-04 Thread via GitHub


je-ik commented on PR #30197:
URL: https://github.com/apache/beam/pull/30197#issuecomment-2037419229

   Hi, unfortunately I did not have time to finish it, yet. But target is 
currently release 2.56.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Feature Request]: Support for Apache Flink 1.18 in Beam Runners [beam]

2024-04-04 Thread via GitHub


Abacn commented on issue #30789:
URL: https://github.com/apache/beam/issues/30789#issuecomment-2037417183

   Hi all, looks like this is highly wanted. Could Flink runner user help 
testing #30197 for their use cases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DO NOT MERGE] Flink 1.17 [beam]

2024-04-04 Thread via GitHub


Abacn commented on PR #30197:
URL: https://github.com/apache/beam/pull/30197#issuecomment-2037409172

   Hi, as https://lists.apache.org/thread/s8o8jc2k2kb41q5g0v0xmoyszg1gdcst gets 
resolved by #30403, is this PR ready to be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] python sdk: fix several bugs regarding avto <-> beam schema conversion [beam]

2024-04-04 Thread via GitHub


benkonz commented on PR #30770:
URL: https://github.com/apache/beam/pull/30770#issuecomment-2037391308

   @ahmedabu98 could you take another look at this PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] update gcloud to 471 in runner image [beam]

2024-04-04 Thread via GitHub


damccorm merged PR #30846:
URL: https://github.com/apache/beam/pull/30846


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >