Re: [PR] Bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.4 to 1.17.8 in /sdks [beam]
github-actions[bot] commented on PR #30669: URL: https://github.com/apache/beam/pull/30669#issuecomment-2005807488 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @lostluck for label go. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.42.2 to 1.53.0 in /sdks [beam]
github-actions[bot] commented on PR #30670: URL: https://github.com/apache/beam/pull/30670#issuecomment-2005807451 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.8 to 1.16.12 in /sdks [beam]
github-actions[bot] commented on PR #30671: URL: https://github.com/apache/beam/pull/30671#issuecomment-2005807401 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.8 to 1.16.11 in /sdks [beam]
dependabot[bot] commented on PR #30660: URL: https://github.com/apache/beam/pull/30660#issuecomment-2005764487 Superseded by #30671. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.8 to 1.16.12 in /sdks [beam]
dependabot[bot] opened a new pull request, #30671: URL: https://github.com/apache/beam/pull/30671 Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/manager](https://github.com/aws/aws-sdk-go-v2) from 1.13.8 to 1.16.12. Commits https://github.com/aws/aws-sdk-go-v2/commit/d1091d0a12dd9d5894ad3961214a6bea5def3062;>d1091d0 Release 2022-08-29 https://github.com/aws/aws-sdk-go-v2/commit/a1140b1b7e70a06b9b49493f095853fc891d3599;>a1140b1 Regenerated Clients https://github.com/aws/aws-sdk-go-v2/commit/8f3045a981b27bde412ecd3143bbea128b34285e;>8f3045a Update SDK's smithy-go dependency to v1.13.0 https://github.com/aws/aws-sdk-go-v2/commit/462d0469260d76219599fb16d025d9db50ea6c56;>462d046 Update endpoints model https://github.com/aws/aws-sdk-go-v2/commit/7aba71b74a65085bc9d1ffa60b893c90011b792d;>7aba71b Update API model https://github.com/aws/aws-sdk-go-v2/commit/1609fe847a5af31ae8a4a306b779e7c13bd45bb6;>1609fe8 credentials/ssocreds: Add SSOTokenProvider for Bearer Token auth (https://redirect.github.com/aws/aws-sdk-go-v2/issues/1818;>#1818) https://github.com/aws/aws-sdk-go-v2/commit/8e755b49d93958105eefdd8bbc3f2500e2c98738;>8e755b4 Release 2022-08-26 https://github.com/aws/aws-sdk-go-v2/commit/d33e0c036c685d4e7d1305216a474e541047aea5;>d33e0c0 Regenerated Clients https://github.com/aws/aws-sdk-go-v2/commit/129270893ae61b7eef49470a157344f23f722453;>1292708 Update API model https://github.com/aws/aws-sdk-go-v2/commit/7d6d53e3e9422ffcfe80d842a351d33bb8dfde70;>7d6d53e add GitHub code owner for auto assigned reviewers (https://redirect.github.com/aws/aws-sdk-go-v2/issues/1817;>#1817) Additional commits viewable in https://github.com/aws/aws-sdk-go-v2/compare/service/mq/v1.13.8...v1.16.12;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/aws/aws-sdk-go-v2/feature/s3/manager=go_modules=1.13.8=1.16.12)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.8 to 1.16.11 in /sdks [beam]
dependabot[bot] closed pull request #30660: Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.8 to 1.16.11 in /sdks URL: https://github.com/apache/beam/pull/30660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.42.2 to 1.52.1 in /sdks [beam]
dependabot[bot] commented on PR #30661: URL: https://github.com/apache/beam/pull/30661#issuecomment-2005761902 Superseded by #30670. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.42.2 to 1.52.1 in /sdks [beam]
dependabot[bot] closed pull request #30661: Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.42.2 to 1.52.1 in /sdks URL: https://github.com/apache/beam/pull/30661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.42.2 to 1.53.0 in /sdks [beam]
dependabot[bot] opened a new pull request, #30670: URL: https://github.com/apache/beam/pull/30670 Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.42.2 to 1.53.0. Commits https://github.com/aws/aws-sdk-go-v2/commit/427c682853151e2495237f55ba2c2f54f362aa02;>427c682 Release 2024-03-18 https://github.com/aws/aws-sdk-go-v2/commit/93e06ffe0b3a6aaa8c14675b219902838aaebb1e;>93e06ff Regenerated Clients https://github.com/aws/aws-sdk-go-v2/commit/58be623729776dcfc511221ce9bab950b62486a5;>58be623 Update endpoints model https://github.com/aws/aws-sdk-go-v2/commit/9927740a2766f85abefb89fffe2b323f05e96b5d;>9927740 Update API model https://github.com/aws/aws-sdk-go-v2/commit/e5b7766b7304a4f8f832a015f88af415321e9fc4;>e5b7766 add ratelimit.None (https://redirect.github.com/aws/aws-sdk-go-v2/issues/2562;>#2562) https://github.com/aws/aws-sdk-go-v2/commit/10c44870fbdabbbac25ffe1a28115efe4fd560d9;>10c4487 Release 2024-03-15 https://github.com/aws/aws-sdk-go-v2/commit/db0b1ab98d1a7fdbb645a5c1d6729da81059a92d;>db0b1ab Regenerated Clients https://github.com/aws/aws-sdk-go-v2/commit/92c0060eae7ecf9476f022adcc5719b2313f5ccc;>92c0060 Update endpoints model https://github.com/aws/aws-sdk-go-v2/commit/e5bb86e3c8d9eea7c779c0466ac492a580937e96;>e5bb86e Update API model https://github.com/aws/aws-sdk-go-v2/commit/49b368e9d7a38a2373c833be135270e5390c2b41;>49b368e Release 2024-03-14 Additional commits viewable in https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.42.2...service/s3/v1.53.0;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/aws/aws-sdk-go-v2/service/s3=go_modules=1.42.2=1.53.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.4 to 1.17.8 in /sdks [beam]
codecov[bot] commented on PR #30669: URL: https://github.com/apache/beam/pull/30669#issuecomment-2005760177 ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/30669?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 38.53%. Comparing base [(`c3b3fa6`)](https://app.codecov.io/gh/apache/beam/commit/c3b3fa62c3a323e8da15a18aeba4a43b628efd24?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) to head [(`6566d41`)](https://app.codecov.io/gh/apache/beam/pull/30669?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). Additional details and impacted files ```diff @@Coverage Diff @@ ## master #30669 +/- ## == - Coverage 38.53% 38.53% -0.01% == Files 698 698 Lines 102361 102361 == - Hits3944439441 -3 - Misses 6128561287 +2 - Partials 1632 1633 +1 ``` | [Flag](https://app.codecov.io/gh/apache/beam/pull/30669/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Coverage Δ | | |---|---|---| | [go](https://app.codecov.io/gh/apache/beam/pull/30669/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | `54.32% <ø> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more. [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/beam/pull/30669?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.4 to 1.17.8 in /sdks [beam]
dependabot[bot] opened a new pull request, #30669: URL: https://github.com/apache/beam/pull/30669 Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.17.4 to 1.17.8. Changelog Sourced from https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/CHANGELOG.md;>github.com/aws/aws-sdk-go-v2/credentials's changelog. Release (2023-04-07) General Highlights Dependency Update: Updated to the latest SDK module versions Module Highlights github.com/aws/aws-sdk-go-v2/service/dlm: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/dlm/CHANGELOG.md#v1150-2023-04-07;>v1.15.0 Announcement: This release includes breaking changes for the timestamp trait on the data lifecycle management client. Feature: Updated timestamp format for GetLifecyclePolicy API Bug Fix: Correct timestamp type for data lifecycle manager. github.com/aws/aws-sdk-go-v2/service/docdb: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/docdb/CHANGELOG.md#v1210-2023-04-07;>v1.21.0 Feature: This release adds a new parameter 'DBClusterParameterGroupName' to 'RestoreDBClusterFromSnapshot' API to associate the name of the DB cluster parameter group while performing restore. github.com/aws/aws-sdk-go-v2/service/fsx: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/fsx/CHANGELOG.md#v1288-2023-04-07;>v1.28.8 Documentation: Amazon FSx for Lustre now supports creating data repository associations on Persistent_1 and Scratch_2 file systems. github.com/aws/aws-sdk-go-v2/service/lambda: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/lambda/CHANGELOG.md#v1310-2023-04-07;>v1.31.0 Feature: This release adds a new Lambda InvokeWithResponseStream API to support streaming Lambda function responses. The release also adds a new InvokeMode parameter to Function Url APIs to control whether the response will be streamed or buffered. github.com/aws/aws-sdk-go-v2/service/quicksight: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/quicksight/CHANGELOG.md#v1340-2023-04-07;>v1.34.0 Feature: This release has two changes: adding the OR condition to tag-based RLS rules in CreateDataSet and UpdateDataSet; adding RefreshSchedule and Incremental RefreshProperties operations for users to programmatically configure SPICE dataset ingestions. github.com/aws/aws-sdk-go-v2/service/redshiftdata: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/redshiftdata/CHANGELOG.md#v1193-2023-04-07;>v1.19.3 Documentation: Update documentation of API descriptions as needed in support of temporary credentials with IAM identity. github.com/aws/aws-sdk-go-v2/service/servicecatalog: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/servicecatalog/CHANGELOG.md#v1181-2023-04-07;>v1.18.1 Documentation: Updates description for property Release (2023-04-06) Module Highlights github.com/aws/aws-sdk-go-v2/service/cloudformation: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/cloudformation/CHANGELOG.md#v1270-2023-04-06;>v1.27.0 Feature: Including UPDATE_COMPLETE as a failed status for DeleteStack waiter. github.com/aws/aws-sdk-go-v2/service/greengrassv2: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/greengrassv2/CHANGELOG.md#v1220-2023-04-06;>v1.22.0 Feature: Add support for SUCCEEDED value in coreDeviceExecutionStatus field. Documentation updates for Greengrass V2. github.com/aws/aws-sdk-go-v2/service/proton: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/proton/CHANGELOG.md#v1210-2023-04-06;>v1.21.0 Feature: This release adds support for the AWS Proton service sync feature. Service sync enables managing an AWS Proton service (creating and updating instances) and all of it's corresponding service instances from a Git repository. github.com/aws/aws-sdk-go-v2/service/rds: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/rds/CHANGELOG.md#v1421-2023-04-06;>v1.42.1 Documentation: Adds and updates the SDK examples Release (2023-04-05) Module Highlights github.com/aws/aws-sdk-go-v2/service/configservice: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/configservice/CHANGELOG.md#v1310-2023-04-05;>v1.31.0 Feature: This release adds resourceType enums for types released in March 2023. github.com/aws/aws-sdk-go-v2/service/ecs: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/ecs/CHANGELOG.md#v1243-2023-04-05;>v1.24.3 Documentation: This is a document only updated to add information about Amazon Elastic Inference (EI). github.com/aws/aws-sdk-go-v2/service/identitystore: https://github.com/aws/aws-sdk-go-v2/blob/v1.17.8/service/identitystore/CHANGELOG.md#v1167-2023-04-05;>v1.16.7 Documentation: Documentation updates for Identity
Re: [PR] [Python] Check feature store existence at construction time [beam]
github-actions[bot] commented on PR #30668: URL: https://github.com/apache/beam/pull/30668#issuecomment-2005691045 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @liferoad for label python. R: @damccorm for label build. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Python] Check feature store existence at construction time [beam]
riteshghorse opened a new pull request, #30668: URL: https://github.com/apache/beam/pull/30668 Check existence of feature store for enrichment handler at construction time and raise an error irrespective of the exception_level. Fixes #30386 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add ExternalTransformProvider example [beam]
liferoad commented on code in PR #30666: URL: https://github.com/apache/beam/pull/30666#discussion_r1529488929 ## examples/multi-language/python/wordcount_external.py: ## @@ -0,0 +1,102 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import logging + +import apache_beam as beam +from apache_beam.io import ReadFromText +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.transforms.external_transform_provider import ExternalTransformProvider +from apache_beam.typehints.row_type import RowTypeConstraint +"""A Python multi-language pipeline that counts words. + +This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in +`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically +provided in Python via the `ExternalTransformProvider` API. + +Example commands for executing this program: + +DirectRunner: +$ python wordcount_external.py --runner DirectRunner --input --output --expansion_service_port + +DataflowRunner: +$ python wordcount_external.py \ + --runner DataflowRunner \ + --temp_location $TEMP_LOCATION \ + --project $GCP_PROJECT \ + --region $GCP_REGION \ + --job_name $JOB_NAME \ + --num_workers $NUM_WORKERS \ + --input "gs://dataflow-samples/shakespeare/kinglear.txt" \ + --output "gs://$GCS_BUCKET/wordcount_external/output" \ + --expansion_service_port Review Comment: do we have the good doc to run this expansion service? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529368187 ## sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/OrderedListUserStateTest.java: ## @@ -0,0 +1,597 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static java.util.Arrays.asList; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.emptyIterable; +import static org.hamcrest.core.Is.is; +import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertThrows; + +import java.io.IOException; +import java.util.Collections; +import org.apache.beam.fn.harness.Caches; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.OrderedListRange; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.util.ByteStringOutputStream; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TimestampedValue.TimestampedValueCoder; +import org.apache.beam.vendor.grpc.v1p60p1.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterables; +import org.joda.time.Instant; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class OrderedListUserStateTest { + private static final TimestampedValue A1 = + TimestampedValue.of("A1", Instant.ofEpochMilli(1)); + private static final TimestampedValue B1 = + TimestampedValue.of("B1", Instant.ofEpochMilli(1)); + private static final TimestampedValue A2 = + TimestampedValue.of("A2", Instant.ofEpochMilli(2)); + private static final TimestampedValue B2 = + TimestampedValue.of("B2", Instant.ofEpochMilli(2)); + private static final TimestampedValue A3 = + TimestampedValue.of("A3", Instant.ofEpochMilli(3)); + private static final TimestampedValue A4 = + TimestampedValue.of("A4", Instant.ofEpochMilli(4)); + + private final String pTransformId = "pTransformId"; + private final String stateId = "stateId"; + private final String encodedWindow = "encodedWindow"; + + @Test + public void testNoPersistedValues() throws Exception { +FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(Collections.emptyMap()); +OrderedListUserState userState = +new OrderedListUserState<>( +Caches.noop(), +fakeClient, +"instructionId", +createOrderedListStateKey("A"), +StringUtf8Coder.of()); +assertThat(userState.read(), is(emptyIterable())); + } + + @Test + public void testRead() throws Exception { +FakeBeamFnStateClient fakeClient = +new FakeBeamFnStateClient( +TimestampedValueCoder.of(StringUtf8Coder.of()), +ImmutableMap.of(createOrderedListStateKey("A", 1), asList(A1, B1))); +OrderedListUserState userState = +new OrderedListUserState<>( +Caches.noop(), +fakeClient, +"instructionId", +createOrderedListStateKey("A"), +StringUtf8Coder.of()); + +assertArrayEquals( +asList(A1, B1).toArray(), Iterables.toArray(userState.read(), TimestampedValue.class)); +userState.asyncClose(); +assertThrows(IllegalStateException.class, () -> userState.read()); + } + + @Test + public void testReadRange() throws Exception { +FakeBeamFnStateClient fakeClient = +new FakeBeamFnStateClient( +TimestampedValueCoder.of(StringUtf8Coder.of()), +ImmutableMap.of( +createOrderedListStateKey("A", 1), asList(A1, B1), +createOrderedListStateKey("A", 4), Collections.singletonList(A4), +createOrderedListStateKey("A", 2), Collections.singletonList(A2))); + +OrderedListUserState userState = +new OrderedListUserState<>( +Caches.noop(), +fakeClient, +"instructionId", +createOrderedListStateKey("A"), +
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529360968 ## sdks/java/harness/src/test/java/org/apache/beam/fn/harness/state/OrderedListUserStateTest.java: ## @@ -0,0 +1,597 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static java.util.Arrays.asList; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.emptyIterable; +import static org.hamcrest.core.Is.is; +import static org.junit.Assert.assertArrayEquals; +import static org.junit.Assert.assertThrows; + +import java.io.IOException; +import java.util.Collections; +import org.apache.beam.fn.harness.Caches; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.OrderedListRange; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.util.ByteStringOutputStream; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TimestampedValue.TimestampedValueCoder; +import org.apache.beam.vendor.grpc.v1p60p1.com.google.protobuf.ByteString; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableMap; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterables; +import org.joda.time.Instant; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +@RunWith(JUnit4.class) +public class OrderedListUserStateTest { + private static final TimestampedValue A1 = + TimestampedValue.of("A1", Instant.ofEpochMilli(1)); + private static final TimestampedValue B1 = + TimestampedValue.of("B1", Instant.ofEpochMilli(1)); + private static final TimestampedValue A2 = + TimestampedValue.of("A2", Instant.ofEpochMilli(2)); + private static final TimestampedValue B2 = + TimestampedValue.of("B2", Instant.ofEpochMilli(2)); + private static final TimestampedValue A3 = + TimestampedValue.of("A3", Instant.ofEpochMilli(3)); + private static final TimestampedValue A4 = + TimestampedValue.of("A4", Instant.ofEpochMilli(4)); + + private final String pTransformId = "pTransformId"; + private final String stateId = "stateId"; + private final String encodedWindow = "encodedWindow"; + + @Test + public void testNoPersistedValues() throws Exception { +FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(Collections.emptyMap()); +OrderedListUserState userState = +new OrderedListUserState<>( +Caches.noop(), +fakeClient, +"instructionId", +createOrderedListStateKey("A"), +StringUtf8Coder.of()); +assertThat(userState.read(), is(emptyIterable())); + } + + @Test + public void testRead() throws Exception { +FakeBeamFnStateClient fakeClient = +new FakeBeamFnStateClient( +TimestampedValueCoder.of(StringUtf8Coder.of()), +ImmutableMap.of(createOrderedListStateKey("A", 1), asList(A1, B1))); +OrderedListUserState userState = +new OrderedListUserState<>( +Caches.noop(), +fakeClient, +"instructionId", +createOrderedListStateKey("A"), +StringUtf8Coder.of()); + +assertArrayEquals( +asList(A1, B1).toArray(), Iterables.toArray(userState.read(), TimestampedValue.class)); +userState.asyncClose(); +assertThrows(IllegalStateException.class, () -> userState.read()); + } + + @Test + public void testReadRange() throws Exception { +FakeBeamFnStateClient fakeClient = +new FakeBeamFnStateClient( +TimestampedValueCoder.of(StringUtf8Coder.of()), +ImmutableMap.of( +createOrderedListStateKey("A", 1), asList(A1, B1), +createOrderedListStateKey("A", 4), Collections.singletonList(A4), +createOrderedListStateKey("A", 2), Collections.singletonList(A2))); + +OrderedListUserState userState = +new OrderedListUserState<>( +Caches.noop(), +fakeClient, +"instructionId", +createOrderedListStateKey("A"), +
Re: [PR] Yaml API: Day Zero tutorial notebook [beam]
bzablocki commented on PR #27284: URL: https://github.com/apache/beam/pull/27284#issuecomment-2005133600 @Polber I converted saving the file to the built-in Jupyter's '%%writefile'. PR is ready for the final review/merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Yaml API: Day Zero tutorial notebook [beam]
bzablocki commented on code in PR #27284: URL: https://github.com/apache/beam/pull/27284#discussion_r1529348432 ## examples/notebooks/get-started/try-apache-beam-yaml.ipynb: ## @@ -0,0 +1,671 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { +"colab_type": "text", +"id": "view-in-github" + }, + "source": [ +"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/get-started/try-apache-beam-yaml.ipynb\; target=\"_parent\">https://colab.research.google.com/assets/colab-badge.svg\; alt=\"Open In Colab\"/>\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { +"cellView": "form" + }, + "outputs": [], + "source": [ +"#@title ## Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n", +"\n", +"# Licensed to the Apache Software Foundation (ASF) under one\n", +"# or more contributor license agreements. See the NOTICE file\n", +"# distributed with this work for additional information\n", +"# regarding copyright ownership. The ASF licenses this file\n", +"# to you under the Apache License, Version 2.0 (the\n", +"# \"License\"); you may not use this file except in compliance\n", +"# with the License. You may obtain a copy of the License at\n", +"#\n", +"# http://www.apache.org/licenses/LICENSE-2.0\n;, +"#\n", +"# Unless required by applicable law or agreed to in writing,\n", +"# software distributed under the License is distributed on an\n", +"# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n", +"# KIND, either express or implied. See the License for the\n", +"# specific language governing permissions and limitations\n", +"# under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { +"colab_type": "text", +"id": "lNKIMlEDZ_Vw" + }, + "source": [ +"# Try Apache Beam - YAML\n", +"\n", +"While Beam provides powerful APIs for authoring sophisticated data processing pipelines, it still has a high barrier for getting started and authoring simple pipelines. Even setting up the environment, installing the dependencies, and setting up the project can be a challenge.\n", +"\n", +"Here we provide a simple declarative syntax for describing pipelines that does not require coding experience or learning how to use an SDKany text editor will do. Some installation may be required to actually *execute* a pipeline, but we envision various services (such as Dataflow) to accept yaml pipelines directly obviating the need for even that in the future. We also anticipate the ability to generate code directly from these higher-level yaml descriptions, should one want to graduate to a full Beam SDK (and possibly the other direction as well as far as possible).\n", +"\n", +"It should be noted that everything here is still under development, but any features already included are considered stable. Feedback is welcome at d...@apache.beam.org.\n", +"\n", +"In this notebook, you set up your development environment and write a simple pipeline using YAML. Then you run it locally, using the [DirectRunner](https://beam.apache.org/documentation/runners/direct/). You can explore other runners with the [Beam Capability Matrix](https://beam.apache.org/documentation/runners/capability-matrix/).\n", +"\n", +"To navigate through different sections, use the table of contents. From **View** drop-down list, select **Table of contents**.\n", +"\n", +"To run a code cell, click the **Run cell** button at the top left of the cell, or select it and press **`Shift+Enter`**. Try modifying a code cell and re-running it to see what happens.\n", +"\n", +"To learn more about Colab, see [Welcome to Colaboratory!](https://colab.sandbox.google.com/notebooks/welcome.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": { +"colab_type": "text", +"id": "Fz6KSQ13_3Rr" + }, + "source": [ +"# Setup\n", +"\n", +"First, you need to set up your environment. The following code installs `apache-beam` and creates directories for your data, pipelines and results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { +"colab": { + "base_uri": "https://localhost:8080/;, + "height": 170 +}, +"colab_type": "code", +"id": "GOOk81Jj_yUy", +"outputId": "d283dfb2-4f51-4fec-816b-f57b0cb9b71c" + }, + "outputs": [], + "source": [ +"def save_to_file(content, file_name):\n", +" with open(file_name, 'w') as f:\n", +"f.write(content)\n", +"\n", Review Comment: Decided to go with the '%%writefile' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529334529 ## sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/OrderedListUserState.java: ## @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Comparator; +import java.util.Map.Entry; +import java.util.NavigableMap; +import java.util.concurrent.CompletableFuture; +import org.apache.beam.fn.harness.Cache; +import org.apache.beam.fn.harness.Caches; +import org.apache.beam.fn.harness.state.StateFetchingIterators.CachingStateIterable; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateClearRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.LengthPrefixCoder; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterables; +import org.apache.beam.sdk.util.ByteStringOutputStream; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TimestampedValue.TimestampedValueCoder; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.BoundType; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Maps; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Range; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.TreeRangeSet; +import org.joda.time.Instant; + +/** + * An implementation of a bag user state that utilizes the Beam Fn State API to fetch, clear and + * persist values. + * + * Calling {@link #asyncClose()} schedules any required persistence changes. This object should + * no longer be used after it is closed. + * + * TODO: Move to an async persist model where persistence is signalled based upon cache memory + * pressure and its need to flush. + */ +public class OrderedListUserState { + private final BeamFnStateClient beamFnStateClient; + private final StateRequest request; + private final Coder valueCoder; + private final TimestampedValueCoder timestampedValueCoder; + // Pending updates to persistent storage + private NavigableMap> pendingAdds = Maps.newTreeMap(); + private TreeRangeSet pendingRemoves = TreeRangeSet.create(); + + private boolean isCleared = false; + private boolean isClosed = false; + + public OrderedListUserState( + Cache cache, + BeamFnStateClient beamFnStateClient, + String instructionId, + StateKey stateKey, + Coder valueCoder) { +checkArgument( +stateKey.hasOrderedListUserState(), +"Expected OrderedListUserState StateKey but received %s.", +stateKey); +this.beamFnStateClient = beamFnStateClient; +this.valueCoder = valueCoder; +this.timestampedValueCoder = TimestampedValueCoder.of(this.valueCoder); +this.request = + StateRequest.newBuilder().setInstructionId(instructionId).setStateKey(stateKey).build(); + } + + public void add(TimestampedValue value) { +checkState( +!isClosed, +"OrderedList user state is no longer usable because it is closed for %s", +request.getStateKey()); +Instant timestamp = value.getTimestamp(); +pendingAdds.putIfAbsent(timestamp, new ArrayList<>()); +pendingAdds.get(timestamp).add(value.getValue()); + } + + public Iterable> readRange(Instant minTimestamp, Instant limitTimestamp) { +checkState( +!isClosed, +"OrderedList user state is no longer usable
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529329576 ## sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/OrderedListUserState.java: ## @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Comparator; +import java.util.Map.Entry; +import java.util.NavigableMap; +import java.util.concurrent.CompletableFuture; +import org.apache.beam.fn.harness.Cache; +import org.apache.beam.fn.harness.Caches; +import org.apache.beam.fn.harness.state.StateFetchingIterators.CachingStateIterable; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateClearRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.LengthPrefixCoder; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterables; +import org.apache.beam.sdk.util.ByteStringOutputStream; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TimestampedValue.TimestampedValueCoder; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.BoundType; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Maps; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Range; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.TreeRangeSet; +import org.joda.time.Instant; + +/** + * An implementation of a bag user state that utilizes the Beam Fn State API to fetch, clear and + * persist values. + * + * Calling {@link #asyncClose()} schedules any required persistence changes. This object should + * no longer be used after it is closed. + * + * TODO: Move to an async persist model where persistence is signalled based upon cache memory + * pressure and its need to flush. + */ +public class OrderedListUserState { + private final BeamFnStateClient beamFnStateClient; + private final StateRequest request; + private final Coder valueCoder; + private final TimestampedValueCoder timestampedValueCoder; + // Pending updates to persistent storage + private NavigableMap> pendingAdds = Maps.newTreeMap(); + private TreeRangeSet pendingRemoves = TreeRangeSet.create(); + + private boolean isCleared = false; + private boolean isClosed = false; + + public OrderedListUserState( + Cache cache, + BeamFnStateClient beamFnStateClient, + String instructionId, + StateKey stateKey, + Coder valueCoder) { +checkArgument( +stateKey.hasOrderedListUserState(), +"Expected OrderedListUserState StateKey but received %s.", +stateKey); +this.beamFnStateClient = beamFnStateClient; +this.valueCoder = valueCoder; +this.timestampedValueCoder = TimestampedValueCoder.of(this.valueCoder); +this.request = + StateRequest.newBuilder().setInstructionId(instructionId).setStateKey(stateKey).build(); + } + + public void add(TimestampedValue value) { +checkState( +!isClosed, +"OrderedList user state is no longer usable because it is closed for %s", +request.getStateKey()); +Instant timestamp = value.getTimestamp(); +pendingAdds.putIfAbsent(timestamp, new ArrayList<>()); +pendingAdds.get(timestamp).add(value.getValue()); + } + + public Iterable> readRange(Instant minTimestamp, Instant limitTimestamp) { +checkState( +!isClosed, +"OrderedList user state is no longer usable
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529328943 ## sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/OrderedListUserState.java: ## @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Comparator; +import java.util.Map.Entry; +import java.util.NavigableMap; +import java.util.concurrent.CompletableFuture; +import org.apache.beam.fn.harness.Cache; +import org.apache.beam.fn.harness.Caches; +import org.apache.beam.fn.harness.state.StateFetchingIterators.CachingStateIterable; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateClearRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.LengthPrefixCoder; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterables; +import org.apache.beam.sdk.util.ByteStringOutputStream; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TimestampedValue.TimestampedValueCoder; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.BoundType; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Maps; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Range; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.TreeRangeSet; +import org.joda.time.Instant; + +/** + * An implementation of a bag user state that utilizes the Beam Fn State API to fetch, clear and + * persist values. + * + * Calling {@link #asyncClose()} schedules any required persistence changes. This object should + * no longer be used after it is closed. + * + * TODO: Move to an async persist model where persistence is signalled based upon cache memory + * pressure and its need to flush. + */ +public class OrderedListUserState { + private final BeamFnStateClient beamFnStateClient; + private final StateRequest request; + private final Coder valueCoder; + private final TimestampedValueCoder timestampedValueCoder; + // Pending updates to persistent storage + private NavigableMap> pendingAdds = Maps.newTreeMap(); + private TreeRangeSet pendingRemoves = TreeRangeSet.create(); + + private boolean isCleared = false; + private boolean isClosed = false; + + public OrderedListUserState( + Cache cache, + BeamFnStateClient beamFnStateClient, + String instructionId, + StateKey stateKey, + Coder valueCoder) { +checkArgument( +stateKey.hasOrderedListUserState(), +"Expected OrderedListUserState StateKey but received %s.", +stateKey); +this.beamFnStateClient = beamFnStateClient; +this.valueCoder = valueCoder; +this.timestampedValueCoder = TimestampedValueCoder.of(this.valueCoder); +this.request = + StateRequest.newBuilder().setInstructionId(instructionId).setStateKey(stateKey).build(); + } + + public void add(TimestampedValue value) { +checkState( +!isClosed, +"OrderedList user state is no longer usable because it is closed for %s", +request.getStateKey()); +Instant timestamp = value.getTimestamp(); +pendingAdds.putIfAbsent(timestamp, new ArrayList<>()); +pendingAdds.get(timestamp).add(value.getValue()); + } + + public Iterable> readRange(Instant minTimestamp, Instant limitTimestamp) { +checkState( +!isClosed, +"OrderedList user state is no longer usable
Re: [PR] use github hosted runners [beam]
faizaanmadhani closed pull request #30667: use github hosted runners URL: https://github.com/apache/beam/pull/30667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529318766 ## sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/OrderedListUserState.java: ## @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Comparator; +import java.util.Map.Entry; +import java.util.NavigableMap; +import java.util.concurrent.CompletableFuture; +import org.apache.beam.fn.harness.Cache; +import org.apache.beam.fn.harness.Caches; +import org.apache.beam.fn.harness.state.StateFetchingIterators.CachingStateIterable; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateClearRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.LengthPrefixCoder; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterables; +import org.apache.beam.sdk.util.ByteStringOutputStream; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TimestampedValue.TimestampedValueCoder; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.BoundType; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Maps; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Range; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.TreeRangeSet; +import org.joda.time.Instant; + +/** + * An implementation of a bag user state that utilizes the Beam Fn State API to fetch, clear and + * persist values. + * + * Calling {@link #asyncClose()} schedules any required persistence changes. This object should + * no longer be used after it is closed. + * + * TODO: Move to an async persist model where persistence is signalled based upon cache memory + * pressure and its need to flush. + */ +public class OrderedListUserState { + private final BeamFnStateClient beamFnStateClient; + private final StateRequest request; + private final Coder valueCoder; + private final TimestampedValueCoder timestampedValueCoder; + // Pending updates to persistent storage + private NavigableMap> pendingAdds = Maps.newTreeMap(); + private TreeRangeSet pendingRemoves = TreeRangeSet.create(); + + private boolean isCleared = false; + private boolean isClosed = false; + + public OrderedListUserState( + Cache cache, + BeamFnStateClient beamFnStateClient, + String instructionId, + StateKey stateKey, + Coder valueCoder) { +checkArgument( +stateKey.hasOrderedListUserState(), +"Expected OrderedListUserState StateKey but received %s.", +stateKey); +this.beamFnStateClient = beamFnStateClient; +this.valueCoder = valueCoder; +this.timestampedValueCoder = TimestampedValueCoder.of(this.valueCoder); +this.request = + StateRequest.newBuilder().setInstructionId(instructionId).setStateKey(stateKey).build(); + } + + public void add(TimestampedValue value) { +checkState( +!isClosed, +"OrderedList user state is no longer usable because it is closed for %s", +request.getStateKey()); +Instant timestamp = value.getTimestamp(); +pendingAdds.putIfAbsent(timestamp, new ArrayList<>()); +pendingAdds.get(timestamp).add(value.getValue()); + } + + public Iterable> readRange(Instant minTimestamp, Instant limitTimestamp) { +checkState( +!isClosed, +"OrderedList user state is no longer usable
[PR] use github hosted runners [beam]
faizaanmadhani opened a new pull request, #30667: URL: https://github.com/apache/beam/pull/30667 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
Abacn merged PR #30664: URL: https://github.com/apache/beam/pull/30664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
Abacn commented on PR #30664: URL: https://github.com/apache/beam/pull/30664#issuecomment-2005038510 Different tests failing on retry: ``` testRampupThrottler (org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest) failed ``` merging for now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry pick #30654 into release branch [beam]
Abacn merged PR #30665: URL: https://github.com/apache/beam/pull/30665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529260758 ## sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/OrderedListUserState.java: ## @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.fn.harness.state; + +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Comparator; +import java.util.Map.Entry; +import java.util.NavigableMap; +import java.util.concurrent.CompletableFuture; +import org.apache.beam.fn.harness.Cache; +import org.apache.beam.fn.harness.Caches; +import org.apache.beam.fn.harness.state.StateFetchingIterators.CachingStateIterable; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateClearRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateKey; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateRequest; +import org.apache.beam.model.fnexecution.v1.BeamFnApi.StateResponse; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.LengthPrefixCoder; +import org.apache.beam.sdk.fn.stream.PrefetchableIterable; +import org.apache.beam.sdk.fn.stream.PrefetchableIterables; +import org.apache.beam.sdk.util.ByteStringOutputStream; +import org.apache.beam.sdk.values.TimestampedValue; +import org.apache.beam.sdk.values.TimestampedValue.TimestampedValueCoder; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.BoundType; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Iterables; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Maps; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.Range; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.TreeRangeSet; +import org.joda.time.Instant; + +/** + * An implementation of a bag user state that utilizes the Beam Fn State API to fetch, clear and Review Comment: ordered list -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529260242 ## model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto: ## @@ -1075,6 +1103,12 @@ message StateClearRequest {} // A response to clear state. message StateClearResponse {} +// A message describes a sort key range [start, end). Review Comment: Maybe add a comment about these values needing to be within `[BoundedWindow.TIMESTAMP_MIN_VALUE, BoundedWindow.TIMESTAMP_MAX_VALUE)`? Should these be the default values for these fields? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement ordered list state for FnApi. [beam]
acrites commented on code in PR #30317: URL: https://github.com/apache/beam/pull/30317#discussion_r1529258289 ## sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/FnApiStateAccessor.java: ## @@ -600,8 +603,74 @@ public MultimapState bindMultimap( @Override public OrderedListState bindOrderedList( String id, StateSpec> spec, Coder elemCoder) { -throw new UnsupportedOperationException( -"TODO: Add support for a sorted-list state to the Fn API."); +return (OrderedListState) +stateKeyObjectCache.computeIfAbsent( +createOrderedListUserStateKey(id), +new Function() { + @Override + public Object apply(StateKey key) { +return new OrderedListState() { + private final OrderedListUserState impl = + createOrderedListUserState(key, elemCoder); + + @Override + public void clear() { Review Comment: Do we want a separate notion of "clear"? i.e. one that deletes *all* state associated with this OrderedList and not just `clearRange(min, max)`, which would only delete all the elements from the OrderedList? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add a way for channels to be closed manually [beam]
scwhittle commented on PR #30425: URL: https://github.com/apache/beam/pull/30425#issuecomment-2004876597 Run Java PreCommit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add a way for channels to be closed manually [beam]
scwhittle commented on PR #30425: URL: https://github.com/apache/beam/pull/30425#issuecomment-2004876092 > Task :runners:google-cloud-dataflow-java:worker:test org.apache.beam.runners.dataflow.worker.windmill.client.grpc.stubs.IsolationChannelTest > testAwaitTermination FAILED org.mockito.exceptions.verification.TooFewActualInvocations at IsolationChannelTest.java:401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add ExternalTransformProvider example [beam]
ahmedabu98 commented on code in PR #30666: URL: https://github.com/apache/beam/pull/30666#discussion_r1529190740 ## examples/multi-language/python/wordcount_external.py: ## @@ -0,0 +1,106 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import logging + +import apache_beam as beam +from apache_beam.io import ReadFromText +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.transforms.external_transform_provider import ExternalTransformProvider +from apache_beam.typehints.row_type import RowTypeConstraint +"""A Python multi-language pipeline that counts words. + +This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in +`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically +provided in Python via the `ExternalTransformProvider` API. + +Before running this program, make sure the expansion service is up and running. You can do so with the command: +$ ./gradlew examples:multi-language:runExpansionService -PexpansionPort= Review Comment: Thanks for the catch, this was an old comment. Removed it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Properly handle timestamp prefixing of unkown window types. [beam]
Abacn commented on PR #30587: URL: https://github.com/apache/beam/pull/30587#issuecomment-2004812901 It seems this PR breaks Samza and Spark VR PostCommit, https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_ValidatesRunner_Samza.yml?query=event%3Aschedule and https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_ValidatesRunner_Spark.yml?query=event%3Aschedule though Flink and Dataflow VR PostCommit remains green Error message ``` test_custom_window_type (apache_beam.runners.portability.spark_runner_test.SparkRunnerTest) failed E RuntimeError: Pipeline test_custom_window_type_1710372598.7745032_87e88bc1-0f78-497a-b23a-02d9a534d8ac failed in state FAILED: java.lang.IllegalArgumentException: Unknown or unsupported WindowFn: beam:window_fn:pickled_python:v1 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
Abacn commented on PR #30664: URL: https://github.com/apache/beam/pull/30664#issuecomment-2004736460 known flaky test: ``` testInvalidRecordReceived (org.apache.beam.sdk.io.gcp.spanner.changestreams.SpannerChangeStreamErrorTest) failed Expected: (an instance of com.google.cloud.spanner.SpannerException and exception with message a string containing "DEADLINE_EXCEEDED") but: an instance of com.google.cloud.spanner.SpannerException but: was <7>> is a java.lang.AssertionError Stacktrace was: java.lang.AssertionError: Expected: <0> but: was <7> ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add ExternalTransformProvider example [beam]
liferoad commented on code in PR #30666: URL: https://github.com/apache/beam/pull/30666#discussion_r1529112663 ## examples/multi-language/python/wordcount_external.py: ## @@ -0,0 +1,106 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import logging + +import apache_beam as beam +from apache_beam.io import ReadFromText +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.transforms.external_transform_provider import ExternalTransformProvider +from apache_beam.typehints.row_type import RowTypeConstraint +"""A Python multi-language pipeline that counts words. + +This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in +`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically +provided in Python via the `ExternalTransformProvider` API. + +Before running this program, make sure the expansion service is up and running. You can do so with the command: +$ ./gradlew examples:multi-language:runExpansionService -PexpansionPort= Review Comment: we should provide the command without using gradelw since the users who just install the beam do not use this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add ExternalTransformProvider example [beam]
github-actions[bot] commented on PR #30666: URL: https://github.com/apache/beam/pull/30666#issuecomment-2004719546 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add ExternalTransformProvider example [beam]
ahmedabu98 commented on PR #30666: URL: https://github.com/apache/beam/pull/30666#issuecomment-2004717123 R: @chamikaramj R: @robertwb R: @liferoad -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
Abacn commented on PR #30664: URL: https://github.com/apache/beam/pull/30664#issuecomment-2004711024 Flaky test in dataflow-worker: testDoFnActiveMessageMetadataReportedOnHeartbeat failed indexOutofBound ``` testDoFnActiveMessageMetadataReportedOnHeartbeat[1: [streamingEngine=true]] (org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest) failed runners/google-cloud-dataflow-java/worker/build/test-results/test/TEST-org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.xml [took 2s] java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:659) at java.util.ArrayList.get(ArrayList.java:435) at java.util.Collections$UnmodifiableList.get(Collections.java:1311) at org.apache.beam.runners.dataflow.worker.windmill.Windmill$ComputationHeartbeatRequest.getHeartbeatRequests(Windmill.java:62820) at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testDoFnActiveMessageMetadataReportedOnHeartbeat(StreamingDataflowWorkerTest.java:3681) ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] add ExternalTransformProvider example [beam]
ahmedabu98 opened a new pull request, #30666: URL: https://github.com/apache/beam/pull/30666 Adding an example for creating SchemaTransforms and using them with the ExternalTransformProvider API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry pick #30654 into release branch [beam]
github-actions[bot] commented on PR #30665: URL: https://github.com/apache/beam/pull/30665#issuecomment-2004683901 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry pick #30654 into release branch [beam]
Abacn commented on PR #30665: URL: https://github.com/apache/beam/pull/30665#issuecomment-2004681376 R: @dmitryor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Release-2.55] Cherry pick #30654 into release branch [beam]
Abacn opened a new pull request, #30665: URL: https://github.com/apache/beam/pull/30665 * Update WindmillBag.java Include byte size of the stateKey on the BagState weight used to estimate and limit the total state cache size * Update WindmillValue.java Include stateKey size in the byte size of a WidnmillValue * Update WindmillWatermarkHold.java Include keyState size in the WatermarkHold estimated byte size * Fix formatting issue * Fix expected cache item weights in WindmillStateInternalsTest **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Include byte size of stateKey in estimated weight of WindmillBag, WindmillValue, and WindmillWatermarkHold [beam]
Abacn merged PR #30654: URL: https://github.com/apache/beam/pull/30654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]
MelodyShen commented on code in PR #30439: URL: https://github.com/apache/beam/pull/30439#discussion_r1529071437 ## runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.util; + +import static org.hamcrest.Matchers.greaterThan; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThat; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */ +@RunWith(JUnit4.class) +// TODO(https://github.com/apache/beam/issues/21230): Remove when new version of errorprone is +// released (2.11.0) +@SuppressWarnings("unused") +public class BoundedQueueExecutorTest { + @Rule public transient Timeout globalTimeout = Timeout.seconds(300); + private static final long MAXIMUM_BYTES_OUTSTANDING = 1000; + private static final int DEFAULT_MAX_THREADS = 2; + private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60; + + private BoundedQueueExecutor executor; + + private Runnable createSleepProcessWorkFn(CountDownLatch latch, AtomicBoolean stop) { +Runnable runnable = +() -> { + try { +Thread.sleep(1000); + } catch (InterruptedException e) { +return; + } + latch.countDown(); + int count = 0; + while (!stop.get()) { +count += 1; Review Comment: Sure change to use countdownlatch. ## runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutorTest.java: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.dataflow.worker.util; + +import static org.hamcrest.Matchers.greaterThan; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertThat; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.Timeout; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Unit tests for {@link org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor}. */ +@RunWith(JUnit4.class) +// TODO(https://github.com/apache/beam/issues/21230): Remove when new version of errorprone is +// released (2.11.0) +@SuppressWarnings("unused") +public class BoundedQueueExecutorTest { + @Rule public transient Timeout globalTimeout = Timeout.seconds(300); + private static final long MAXIMUM_BYTES_OUTSTANDING = 1000; + private static final int DEFAULT_MAX_THREADS = 2; + private static final int DEFAULT_THREAD_EXPIRATION_SEC = 60; + + private BoundedQueueExecutor
Re: [PR] Tune maximum thread count for streaming dataflow worker executor dynamically. [beam]
MelodyShen commented on code in PR #30439: URL: https://github.com/apache/beam/pull/30439#discussion_r1529071176 ## runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java: ## @@ -149,18 +191,22 @@ public String summaryHtml() { builder.append(executor.getMaximumPoolSize()); builder.append("/n"); + builder.append("Maximum Threads: "); + builder.append(maximumThreadCount()); + builder.append("/n"); + builder.append("Active Threads: "); builder.append(executor.getActiveCount()); builder.append("/n"); builder.append("Work Queue Size: "); - builder.append(elementsOutstanding); + builder.append(elementsOutstanding()); Review Comment: Ah I didn't notice that. Reverted and added a test case for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Include byte size of stateKey in estimated weight of WindmillBag, WindmillValue, and WindmillWatermarkHold [beam]
Abacn commented on PR #30654: URL: https://github.com/apache/beam/pull/30654#issuecomment-2004669097 ``` org.apache.beam.sdk.util.UnboundedScheduledExecutorServiceTest > testThreadsAreAddedOnlyAsNeededWithContention FAILED java.lang.AssertionError at UnboundedScheduledExecutorServiceTest.java:556 ``` unrelated flaky test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add a way for channels to be closed manually [beam]
m-trieu commented on PR #30425: URL: https://github.com/apache/beam/pull/30425#issuecomment-2004659417 Run Java Precommit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
Abacn commented on PR #30664: URL: https://github.com/apache/beam/pull/30664#issuecomment-2004620576 R: @arunpandianp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]
Abacn commented on PR #30662: URL: https://github.com/apache/beam/pull/30662#issuecomment-2004618077 PreCommit Python Coverage stucks executing `:sdks:python:test-suites:tox:py38:testPy38tft-113` trying again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
github-actions[bot] commented on PR #30664: URL: https://github.com/apache/beam/pull/30664#issuecomment-2004614036 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
Abacn commented on PR #30664: URL: https://github.com/apache/beam/pull/30664#issuecomment-2004611577 R: @arvindram03 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [Release-2.55] Cherry-pick #30648 into release branch [beam]
Abacn opened a new pull request, #30664: URL: https://github.com/apache/beam/pull/30664 This reverts commit ffe2dba532028cdbbb5bca9c374f0a2d756ee8bf. **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Include byte size of stateKey in estimated weight of WindmillBag, WindmillValue, and WindmillWatermarkHold [beam]
dmitryor commented on PR #30654: URL: https://github.com/apache/beam/pull/30654#issuecomment-2004594496 R: @scwhittle fixed the test, thank you for pointing to it, still learning workflows for Beam on GitHub. The test is inherently fragile, but I'm not sure how to improve it without major rework. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]
robertwb commented on PR #30662: URL: https://github.com/apache/beam/pull/30662#issuecomment-2004578186 Sure, and I'm not opposing this change now that's it's been implemented and approved, but the preferred workaround is to use flex templates instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]
liferoad commented on PR #30662: URL: https://github.com/apache/beam/pull/30662#issuecomment-2004566895 > FWIW, now that we have flex templates, I don't think we should be propagating `ValueProvider`s more places than they already are. (IMHO, it'd be preferable to finally deprecate and remove them.) I agree but some existing customers have reported this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
liferoad commented on PR #30655: URL: https://github.com/apache/beam/pull/30655#issuecomment-2004565119 > is 2GB the total # of bytes processed? Yes. The problem is `beam.CombineGlobally(beam.combiners.TopCombineFn(self.top_count)).without_defaults()` does the sorting on all (k, v) pairs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]
robertwb commented on PR #30662: URL: https://github.com/apache/beam/pull/30662#issuecomment-2004547641 FWIW, now that we have flex templates, I don't think we should be propagating `ValueProvider`s more places than they already are. (IMHO, it'd be preferable to finally deprecate and remove them.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] only record per state change processing times for streaming pipelines [beam]
github-actions[bot] commented on PR #30653: URL: https://github.com/apache/beam/pull/30653#issuecomment-2004538155 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @jrmccluskey added as fallback since no labels match configuration Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add typehint support for collections.abc.Set types [beam]
jrmccluskey commented on PR #29137: URL: https://github.com/apache/beam/pull/29137#issuecomment-2004522838 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump google.golang.org/protobuf from 1.32.0 to 1.33.0 in /sdks [beam]
jrmccluskey merged PR #30629: URL: https://github.com/apache/beam/pull/30629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update Python Dependencies from BRANCH weekly_update_python_dependencies_1710635460 [beam]
Abacn closed pull request #30656: Update Python Dependencies from BRANCH weekly_update_python_dependencies_1710635460 URL: https://github.com/apache/beam/pull/30656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
Abacn commented on PR #30655: URL: https://github.com/apache/beam/pull/30655#issuecomment-2004248331 +1, needs 50 workers to proceed 2 GB data sounds not quite right in general. There must be some underlying issue that actually need to be resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]
github-actions[bot] commented on PR #30662: URL: https://github.com/apache/beam/pull/30662#issuecomment-2004234732 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @jrmccluskey for label python. R: @Abacn for label io. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
tvalentyn commented on PR #30655: URL: https://github.com/apache/beam/pull/30655#issuecomment-2004179094 is 2GB the total # of bytes processed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Use 50 workers for 2GB 10 bytes combine test [beam]
tvalentyn commented on PR #30655: URL: https://github.com/apache/beam/pull/30655#issuecomment-2004174262 Do you think there is some inefficiency? do we need 10x workers for this test in streaming mode or batch and streaming tests are fundamentally different? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add typehint support for collections.abc.Set types [beam]
tvalentyn commented on PR #29137: URL: https://github.com/apache/beam/pull/29137#issuecomment-2004162693 Thanks a lot @alimuldal ! Please open a separate issue so that it is easier to find and reference this conversation and cc @jrmccluskey who is our typehint expert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Beam errors out when using PortableRunner (Flink Runner) – `Cannot run program "docker"` [beam]
harrymyburgh opened a new issue, #30663: URL: https://github.com/apache/beam/issues/30663 ### What happened? I am trying to deploy a Beam job (Python Beam) that runs on a PortableRunner (Flink Runner) in my Kubernetes cluster. I have not experienced issues prior with Beam using the Flink Runner. However, today I tried to set up Beam to be a consumer from Apache Kafka using `ReadFromKafka` from `apache_beam.io.kafka`. My Flink Cluster is managed by the Apache Flink Kubernetes Operator. My Beam jobs are managed by a Beam Flink Job Manager, which posts Beam jobs to the Flink master. The Job Manager uses the image `apache/beam_flink1.16_job_server:2.54.0`. My Flink Task Managers each contain a sidecar for a Beam worker pool, which is spun up using the image `apache/beam_python3.11_sdk:2.54.0` and the arg `--worker_pool`. When I start my beam job, I get the following error on the job manager logs: ``` Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory ``` These are my Beam pipeline options: ``` --job_name=beam_example_pipeline --runner=PortableRunner --job_endpoint=beam-flink-job-server:8099 --artifact_endpoint=beam-flink-job-server:8098 --environment_type=EXTERNAL --environment_config=localhost:5 --parallelism=1 --streaming ``` [Some resources I've found](https://lists.apache.org/thread/4qr4dlg5h8kplq728cfwl1vcqfqv3zf6) suggest that the Kafka transform has its own environment type which is set to (and overrides any environment you set?) `--environment_type=DOCKER`, which is what causes the issues. However, I could be wrong, so please say so if I am. All of this taking place on a Kubernetes cluster, where, to my knowledge, Docker in Docker is not recommended. I do not want to use a PROCESS environment_type, I require EXTERNAL. How can I resolve this issue? Is this a bug with Beam? ### Issue Priority Priority: 2 (default / most bugs should be filed as P2) ### Issue Components - [X] Component: Python SDK - [X] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [X] Component: IO connector - [ ] Component: Beam YAML - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [X] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Support ValueProvider for _CustomBigQueryStorageSource [beam]
liferoad opened a new pull request, #30662: URL: https://github.com/apache/beam/pull/30662 Following https://stackoverflow.com/questions/67441316/attributeerror-runtimevalueprovider-object-has-no-attribute-projectid, we need to support ValueProvider for _CustomBigQueryStorageSource Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Initial Commit for AvroPayloadSerializer [beam]
github-actions[bot] commented on PR #22348: URL: https://github.com/apache/beam/pull/22348#issuecomment-2003806700 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Implementing lull reporting at bundle level processing" [beam]
scwhittle merged PR #30648: URL: https://github.com/apache/beam/pull/30648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Implementing lull reporting at bundle level processing" [beam]
scwhittle closed pull request #30648: Revert "Implementing lull reporting at bundle level processing" URL: https://github.com/apache/beam/pull/30648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Include byte size of stateKey in estimated weight of WindmillBag, WindmillValue, and WindmillWatermarkHold [beam]
scwhittle commented on PR #30654: URL: https://github.com/apache/beam/pull/30654#issuecomment-2003324742 From precommit logs it appears the tests need to be updated ``` org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateInternalsTest > testCachedCombining FAILED java.lang.AssertionError at WindmillStateInternalsTest.java:3188 > Task :runners:google-cloud-dataflow-java:worker:test org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateInternalsTest > testCachedBag FAILED java.lang.AssertionError at WindmillStateInternalsTest.java:3086 org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateInternalsTest > testCachedWatermarkHold FAILED java.lang.AssertionError at WindmillStateInternalsTest.java:3148 org.apache.beam.runners.dataflow.worker.windmill.state.WindmillStateInternalsTest > testCachedValue FAILED java.lang.AssertionError at WindmillStateInternalsTest.java:3046 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.8 to 1.16.11 in /sdks [beam]
github-actions[bot] commented on PR #30660: URL: https://github.com/apache/beam/pull/30660#issuecomment-2002997858 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org