Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.30.0 in /sdks [beam]
github-actions[bot] commented on PR #30901: URL: https://github.com/apache/beam/pull/30901#issuecomment-2044180860 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump google.golang.org/grpc from 1.62.1 to 1.63.2 in /sdks [beam]
dependabot[bot] opened a new pull request, #30900: URL: https://github.com/apache/beam/pull/30900 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.62.1 to 1.63.2. Release notes Sourced from https://github.com/grpc/grpc-go/releases;>google.golang.org/grpc's releases. Release 1.63.2 Bugs Fix the user agent string Release 1.63.1 Bugs grpc: fixed subchannel log messages to properly reference the parent channel (https://redirect.github.com/grpc/grpc-go/issues/7101;>#7101) Special thanks: https://github.com/daniel-weisse;>@daniel-weisse API Changes grpc: remove Deprecated tag from Dial and DialContext; these will be deprecated in v1.64 instead (https://redirect.github.com/grpc/grpc-go/issues/7103;>#7103) Release 1.63.0 Behavior Changes grpc: Return canonical target string from resolver.Address.String() (experimental) (https://redirect.github.com/grpc/grpc-go/issues/6923;>#6923) client server: when using write buffer pooling, use input value for buffer size instead of size*2 (https://redirect.github.com/grpc/grpc-go/issues/6983;>#6983) Special Thanks: https://github.com/raghav-stripe;>@raghav-stripe New Features grpc: add ClientConn.CanonicalTarget() to return the canonical target string. (https://redirect.github.com/grpc/grpc-go/issues/7006;>#7006) xds: implement LRS named metrics support (https://github.com/grpc/proposal/blob/master/A64-lrs-custom-metrics.md;>gRFC A64) (https://redirect.github.com/grpc/grpc-go/issues/7027;>#7027) Special Thanks: https://github.com/danielzhaotongliu;>@danielzhaotongliu grpc: introduce grpc.NewClient to allow users to create new clients in idle mode and with dns as the default resolver (https://redirect.github.com/grpc/grpc-go/issues/7010;>#7010) Special Thanks: https://github.com/bruuuce;>@bruuuce API Changes grpc: stabilize experimental method ClientConn.Target() (https://redirect.github.com/grpc/grpc-go/issues/7006;>#7006) Bug Fixes xds: fix an issue that would cause the client to send an empty list of resources for LDS/CDS upon reconnecting with the management server (https://redirect.github.com/grpc/grpc-go/issues/7026;>#7026) server: Fix some errors returned by a server when using a grpc.Server as an http.Handler with the Go stdlib HTTP server (https://redirect.github.com/grpc/grpc-go/issues/6989;>#6989) resolver/dns: add SetResolvingTimeout to allow configuring the DNS resolver's global timeout (https://redirect.github.com/grpc/grpc-go/issues/6917;>#6917) Special Thanks: https://github.com/and1truong;>@and1truong Set the security level of Windows named pipes to NoSecurity (https://redirect.github.com/grpc/grpc-go/issues/6956;>#6956) Special Thanks: https://github.com/irsl;>@irsl Release 1.62.2 Dependencies Update http2 library to address vulnerability https://www.kb.cert.org/vuls/id/421644;>CVE-2023-45288 Commits https://github.com/grpc/grpc-go/commit/d32e66ce27447a0a217464a36fdd3935801c0453;>d32e66c Change version to 1.63.2 (https://redirect.github.com/grpc/grpc-go/issues/7104;>#7104) https://github.com/grpc/grpc-go/commit/92f6dd0c1083430b830e51aaca0db371b06bc99e;>92f6dd0 channelz: pass parent pointer instead of parent ID to RegisterSubChannel (https://redirect.github.com/grpc/grpc-go/issues/7101;>#7101) https://github.com/grpc/grpc-go/commit/0f6ef0fbe51aa33d05a91d0fa87b28113b83f5a9;>0f6ef0f grpc: un-deprecate Dial and DialContext https://github.com/grpc/grpc-go/commit/58dc74987513afa14f8d2ad1643f4b6f192d6ee8;>58dc749 Change version to 1.63.1-dev (https://redirect.github.com/grpc/grpc-go/issues/7051;>#7051) https://github.com/grpc/grpc-go/commit/c68f4566b9cacdb11c42a6eb14ee66a33d9b7c12;>c68f456 Change version to 1.63.0 (https://redirect.github.com/grpc/grpc-go/issues/7050;>#7050) https://github.com/grpc/grpc-go/commit/6369167ae33538aca051225fe5b5e07c2b022eb5;>6369167 *: update http2 dependency (https://redirect.github.com/grpc/grpc-go/issues/7082;>#7082) https://github.com/grpc/grpc-go/commit/88547614e7427f7ce87a5f5485d897ae09044641;>8854761 cherry-pick: channelz: fix race accessing channelMap without lock (https://redirect.github.com/grpc/grpc-go/issues/7079;>#7079) (https://redirect.github.com/grpc/grpc-go/issues/7;>#7... https://github.com/grpc/grpc-go/commit/e62770d76fc9508a2266d8bb8c6f2967e0dbc6fc;>e62770d channelz: add LocalAddr to listen sockets and test (https://redirect.github.com/grpc/grpc-go/issues/7062;>#7062) (https://redirect.github.com/grpc/grpc-go/issues/7063;>#7063) https://github.com/grpc/grpc-go/commit/4ffccf1a5f97417eeb000d521f6415b6eb33bb5f;>4ffccf1 googlec2p: use xdstp style template for client LDS resource name (https://redirect.github.com/grpc/grpc-go/issues/7048;>#7048)
Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks [beam]
dependabot[bot] closed pull request #30557: Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks URL: https://github.com/apache/beam/pull/30557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] The PostCommit XVR JavaUsingPython Dataflow job is flaky [beam]
github-actions[bot] opened a new issue, #30899: URL: https://github.com/apache/beam/issues/30899 The PostCommit XVR JavaUsingPython Dataflow is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_XVR_JavaUsingPython_Dataflow.yml?query=is%3Afailure+branch%3Amaster to see the logs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump GCP-BOM to 26.36.0 [beam]
Abacn merged PR #30868: URL: https://github.com/apache/beam/pull/30868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.30.0 in /sdks [beam]
dependabot[bot] opened a new pull request, #30901: URL: https://github.com/apache/beam/pull/30901 Bumps [github.com/testcontainers/testcontainers-go](https://github.com/testcontainers/testcontainers-go) from 0.26.0 to 0.30.0. Release notes Sourced from https://github.com/testcontainers/testcontainers-go/releases;>github.com/testcontainers/testcontainers-go's releases. v0.30.0 What's Changed Features feat(k6):Add remote test scripts (https://redirect.github.com/testcontainers/testcontainers-go/issues/2350;>#2350) https://github.com/bearrito;>@bearrito feat: optimizes file copies to and from containers (https://redirect.github.com/testcontainers/testcontainers-go/issues/2450;>#2450) https://github.com/codefromthecrypt;>@codefromthecrypt feat(exitcode): Add exit code sugar method (https://redirect.github.com/testcontainers/testcontainers-go/issues/2342;>#2342) https://github.com/bearrito;>@bearrito feat: add module to support InfluxDB v1.x (https://redirect.github.com/testcontainers/testcontainers-go/issues/1703;>#1703) https://github.com/JJCinAZ;>@JJCinAZ feat: authenticate docker on PullImage (https://redirect.github.com/testcontainers/testcontainers-go/issues/2446;>#2446) https://github.com/codefromthecrypt;>@codefromthecrypt feat: add distribution-registry module (https://redirect.github.com/testcontainers/testcontainers-go/issues/2341;>#2341) https://github.com/mdelapenya;>@mdelapenya feat: support passing io.Reader as ContainerFile (https://redirect.github.com/testcontainers/testcontainers-go/issues/2401;>#2401) https://github.com/mdelapenya;>@mdelapenya feat(MustConn): Add MustConnectionString on (some) dbs (https://redirect.github.com/testcontainers/testcontainers-go/issues/2343;>#2343) https://github.com/bearrito;>@bearrito feat: support for waiting for response headers (https://redirect.github.com/testcontainers/testcontainers-go/issues/2349;>#2349) https://github.com/mdelapenya;>@mdelapenya Add method for getting Weaviate's gRPC port (https://redirect.github.com/testcontainers/testcontainers-go/issues/2339;>#2339) https://github.com/antas-marcin;>@antas-marcin feat: add openfga module (https://redirect.github.com/testcontainers/testcontainers-go/issues/2332;>#2332) https://github.com/mdelapenya;>@mdelapenya Bug Fixes Fix: HTTP wait strategy does not take query params into account (https://redirect.github.com/testcontainers/testcontainers-go/issues/2466;>#2466) https://github.com/benja-M-1;>@benja-M-1 fix: logging deadlock (https://redirect.github.com/testcontainers/testcontainers-go/issues/2346;>#2346) https://github.com/stevenh;>@stevenh fix(exec): updates the Multiplexed opt to combine stdout and stderr (https://redirect.github.com/testcontainers/testcontainers-go/issues/2452;>#2452) https://github.com/gustavosbarreto;>@gustavosbarreto bug:Fix AMQPS url (https://redirect.github.com/testcontainers/testcontainers-go/issues/2462;>#2462) https://github.com/bearrito;>@bearrito Added error handling for context.Canceled in log reading code (https://redirect.github.com/testcontainers/testcontainers-go/issues/2268;>#2268) https://github.com/prateekdwivedi;>@prateekdwivedi fix: consul race on HTTP port (https://redirect.github.com/testcontainers/testcontainers-go/issues/2336;>#2336) https://github.com/codefromthecrypt;>@codefromthecrypt Documentation docs: fix wrong copypaste in Weaviate docs (https://redirect.github.com/testcontainers/testcontainers-go/issues/2338;>#2338) https://github.com/mdelapenya;>@mdelapenya 粒 Housekeeping Upgrade neo4j module to use features from v0.29.1 of testcontainers-go (https://redirect.github.com/testcontainers/testcontainers-go/issues/2463;>#2463) https://github.com/danielorbach;>@danielorbach chore: use docker compose (v2) instead of docker-compose (v1) (https://redirect.github.com/testcontainers/testcontainers-go/issues/2464;>#2464) https://github.com/mdelapenya;>@mdelapenya refactor: Add Weaviate modules tests (https://redirect.github.com/testcontainers/testcontainers-go/issues/2447;>#2447) https://github.com/antas-marcin;>@antas-marcin docs: Fix typo in ci-test-go.yml (https://redirect.github.com/testcontainers/testcontainers-go/issues/2394;>#2394) https://github.com/uh-zz;>@uh-zz redpanda: set entrypoint to the custom entrypoint file (https://redirect.github.com/testcontainers/testcontainers-go/issues/2347;>#2347) https://github.com/bojand;>@bojand Move the container and config tests into a test package (https://redirect.github.com/testcontainers/testcontainers-go/issues/2242;>#2242) https://github.com/Minivera;>@Minivera chore: use WithEnv option in localstack module (https://redirect.github.com/testcontainers/testcontainers-go/issues/2337;>#2337) https://github.com/mdelapenya;>@mdelapenya chore: check that the new version is not empty
Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks [beam]
dependabot[bot] commented on PR #30557: URL: https://github.com/apache/beam/pull/30557#issuecomment-2044149633 Superseded by #30901. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Patch release website changes [beam]
damccorm commented on PR #30839: URL: https://github.com/apache/beam/pull/30839#issuecomment-2042721488 R: @Abacn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Patch release website changes [beam]
github-actions[bot] commented on PR #30839: URL: https://github.com/apache/beam/pull/30839#issuecomment-2042724168 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Patch release website changes [beam]
damccorm commented on PR #30839: URL: https://github.com/apache/beam/pull/30839#issuecomment-2042741503 R: @liferoad since I think Yi is out today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]
brachipa commented on issue #30498: URL: https://github.com/apache/beam/issues/30498#issuecomment-2042822014 I also tried run calcite (same version as beam uses) unit test with the same query and it works fine, with my fix and without my fix. I think it sounds like an issue in beam added in org.apache.calcite.test.JdbcTest ``` @Test void testSimple() { final String sql = "select \"name\" as n1, count(*) as c\n" + "from \"hr\".\"emps\" group by \"name\""; CalciteAssert.that() .with(CalciteAssert.Config.REGULAR) .query(sql) .returns("N1=Theodore; C=1\nN1=Eric; C=1\nN1=Bill; C=1\nN1=Sebastian; C=1\n"); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
liferoad commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556091869 ## CHANGES.md: ## @@ -80,6 +80,7 @@ ## Bugfixes * Fixed locking issue when shutting down inactive bundle processors. Symptoms of this issue include slowness or stuckness in long-running jobs (Python) ([#30679](https://github.com/apache/beam/pull/30679)). +* Fixed kafka polling issue due to short consumer polling timeout time. Changes to make the consumer polling timeout configurable for KafkaIO.Read with new command: KafkaIO.read().withConsumerPollingTimeout(Duration duration). Default timeout has been increased from 1 second to 2 seconds( [#30870](https://github.com/apache/beam/pull/30877)). Review Comment: Can we move this to Breaking Changes? We can reword these like: Default consumer polling timeout for `KafkaIO.Read` was increased from 1 second to 2 seconds. Use `KafkaIO.read().withConsumerPollingTimeout(Duration duration)` to configure this timeout value when necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py311 [beam]
dependabot[bot] opened a new pull request, #30891: URL: https://github.com/apache/beam/pull/30891 Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 to 4.6.3. Changelog Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's changelog. Changelog Changes in Version 4.7 PyMongo 4.7 brings a number of improvements including: Added the :class:pymongo.hello.Hello.connection_id, :attr:pymongo.monitoring.CommandStartedEvent.server_connection_id, :attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id, and :attr:pymongo.monitoring.CommandFailedEvent.server_connection_id properties. Fixed a bug where inflating a :class:~bson.raw_bson.RawBSONDocument containing a :class:~bson.code.Code would cause an error. Significantly improved the performance of encoding BSON documents to JSON. Support for named KMS providers for client side field level encryption. Previously supported KMS providers were only: aws, azure, gcp, kmip, and local. The KMS provider is now expanded to support name suffixes (e.g. local:myname). Named KMS providers enables more than one of each KMS provider type to be configured. See the docstring for :class:~pymongo.encryption_options.AutoEncryptionOpts. Note that named KMS providers requires pymongocrypt =1.9 and libmongocrypt =1.9. :meth:~pymongo.encryption.ClientEncryption.encrypt and :meth:~pymongo.encryption.ClientEncryption.encrypt_expression now allow key_id to be passed in as a :class:uuid.UUID. Fixed a bug where :class:~bson.int64.Int64 instances could not always be encoded by orjson_. The following now works:: import orjson from bson import json_util orjson.dumps({'a': Int64(1)}, default=json_util.default, option=orjson.OPT_PASSTHROUGH_SUBCLASS) .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson Fixed a bug appearing in Python 3.12 where RuntimeError: can't create new thread at interpreter shutdown could be written to stderr when a MongoClient's thread starts as the python interpreter is shutting down. Added a warning when connecting to DocumentDB and CosmosDB clusters. For more information regarding feature compatibility and support please visit mongodb.com/supportability/documentdb https://mongodb.com/supportability/documentdb;_ and mongodb.com/supportability/cosmosdb https://mongodb.com/supportability/cosmosdb;_. Added the :attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration, :attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, and :attr:pymongo.monitoring.ConnectionReadyEvent.duration properties. Added the type and kwargs arguments to :class:~pymongo.operations.SearchIndexModel to enable creating vector search indexes in MongoDB Atlas. Fixed a bug where read_concern and write_concern were improperly added to :meth:~pymongo.collection.Collection.list_search_indexes queries. Unavoidable breaking changes ... (truncated) Commits https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f BUMP 4.6.3 https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564) https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3 BUMP to 4.6.3.dev0 See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py311 [beam]
github-actions[bot] commented on PR #30891: URL: https://github.com/apache/beam/pull/30891#issuecomment-2043302472 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @riteshghorse for label python. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]
github-actions[bot] commented on PR #30890: URL: https://github.com/apache/beam/pull/30890#issuecomment-2043302615 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @damccorm for label python. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]
github-actions[bot] commented on PR #30888: URL: https://github.com/apache/beam/pull/30888#issuecomment-2043302869 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @tvalentyn for label python. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Support FLOAT32 type in Spanner [beam]
github-actions[bot] commented on PR #30893: URL: https://github.com/apache/beam/pull/30893#issuecomment-2043375457 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]
damccorm closed pull request #30890: Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 URL: https://github.com/apache/beam/pull/30890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]
dependabot[bot] commented on PR #30890: URL: https://github.com/apache/beam/pull/30890#issuecomment-2043419704 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
jbsabbagh commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556092498 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,8 +525,11 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over +LOG.warn( +"No messages retrieved with polling timeout {} seconds. Consider increasing the consumer polling timeout using withConsumerPollingTimeout method.", +consumerPollingTimeout.getSeconds()); Review Comment: Whoops - sorry I missed this. I think this warning should be emitted only when `rawRecords` is empty. There are still cases where the timeout is over and `rawRecords` is not empty ```suggestion if rawRecords.isEmpty() { LOG.warn( "No messages retrieved with polling timeout {} seconds. Consider increasing the consumer polling timeout using withConsumerPollingTimeout method.", consumerPollingTimeout.getSeconds()); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
jbsabbagh commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556092498 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,8 +525,11 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over +LOG.warn( +"No messages retrieved with polling timeout {} seconds. Consider increasing the consumer polling timeout using withConsumerPollingTimeout method.", +consumerPollingTimeout.getSeconds()); Review Comment: Whoops - sorry I missed this. I think this warning should be emitted only when rawRecords is empty. There are still cases where the timeout is over and `rawRecords` is not empty ```suggestion if rawRecords.isEmpty() { LOG.warn( "No messages retrieved with polling timeout {} seconds. Consider increasing the consumer polling timeout using withConsumerPollingTimeout method.", consumerPollingTimeout.getSeconds()); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Support FLOAT32 type in Spanner [beam]
arawind opened a new pull request, #30893: URL: https://github.com/apache/beam/pull/30893 Adds support for the newly added FLOAT32 type in Google Cloud Spanner. Since FLOAT was already mapped to FLOAT64, this change modifies the existing mapping to FLOAT -> FLOAT32 instead. Addresses #30700. [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]
kennknowles commented on issue #30498: URL: https://github.com/apache/beam/issues/30498#issuecomment-2043220798 Wow nice detective work. If we still see it in Beam but not Calcite then it must be one of our optimization rules. I would look at the debug trace for when it goes wrong. Or with a short test like that, maybe just brute force removing one rule at a time. Or remove all the optimization rules and add them back one at a time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py310 [beam]
github-actions[bot] commented on PR #30889: URL: https://github.com/apache/beam/pull/30889#issuecomment-2043302750 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @shunping for label python. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add PubSubIO Stress test [beam]
github-actions[bot] commented on PR #30886: URL: https://github.com/apache/beam/pull/30886#issuecomment-2043317594 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Initial Iceberg Sink [beam]
kennknowles commented on code in PR #30797: URL: https://github.com/apache/beam/pull/30797#discussion_r1552651757 ## sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/IcebergIO.java: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.iceberg; + +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +public class IcebergIO { + + public static WriteRows writeToDynamicDestinations( Review Comment: We could do that. I was thinking that we might make convenience methods later like `writeToTable(catalog, table_id)` so I made this name extra long. We can always add others. I tend to prefer different method names rather than overloading. ## sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/AppendFilesToTables.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.iceberg; + +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.WithKeys; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.Snapshot; +import org.apache.iceberg.Table; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.checkerframework.checker.nullness.qual.MonotonicNonNull; + +class AppendFilesToTables +extends PTransform, PCollection>> { + + private final IcebergCatalogConfig catalogConfig; + + AppendFilesToTables(IcebergCatalogConfig catalogConfig) { +this.catalogConfig = catalogConfig; + } + + @Override + public PCollection> expand(PCollection writtenFiles) { + +// Apply any sharded writes and flatten everything for catalog updates +return writtenFiles +.apply( +"Key metadata updates by table", +WithKeys.of( +new SerializableFunction() { + @Override + public String apply(FileWriteResult input) { +return input.getTableIdentifier().toString(); + } +})) +// .setCoder(KvCoder.of(StringUtf8Coder.of(), new MetadataUpdate.MetadataUpdateCoder())) Review Comment: Done ## sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/FileWriteResult.java: ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to
Re: [PR] Optimize histograms and metric names in BigQuerySinkMetrics [beam]
codecov-commenter commented on PR #30796: URL: https://github.com/apache/beam/pull/30796#issuecomment-2043422958 ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) Report Attention: Patch coverage is `87.5%` with `2 lines` in your changes are missing coverage. Please review. > Project coverage is 70.86%. Comparing base [(`e894d8c`)](https://app.codecov.io/gh/apache/beam/commit/e894d8ca7fe10ed4771ce9d999b48361c25220ec?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) to head [(`ebf509e`)](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). > Report is 104 commits behind head on master. > :exclamation: Current head ebf509e differs from pull request most recent head e3f3012. Consider uploading reports for the commit e3f3012 to get more accurate results | [Files](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Patch % | Lines | |---|---|---| | [.../beam/sdk/io/gcp/bigquery/BigQuerySinkMetrics.java](https://app.codecov.io/gh/apache/beam/pull/30796?src=pr=tree=sdks%2Fjava%2Fio%2Fgoogle-cloud-platform%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Fsdk%2Fio%2Fgcp%2Fbigquery%2FBigQuerySinkMetrics.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-c2Rrcy9qYXZhL2lvL2dvb2dsZS1jbG91ZC1wbGF0Zm9ybS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvYmVhbS9zZGsvaW8vZ2NwL2JpZ3F1ZXJ5L0JpZ1F1ZXJ5U2lua01ldHJpY3MuamF2YQ==) | 86.66% | [1 Missing and 1 partial :warning: ](https://app.codecov.io/gh/apache/beam/pull/30796?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #30796 +/- ## - Coverage 71.47% 70.86% -0.61% - Complexity0 2990+2990 Files 710 1062 +352 Lines104815 132697 +27882 Branches 0 3230+3230 + Hits 7491594034 +19119 - Misses2826835570+7302 - Partials 1632 3093+1461 ``` | [Flag](https://app.codecov.io/gh/apache/beam/pull/30796/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Coverage Δ | | |---|---|---| | [java](https://app.codecov.io/gh/apache/beam/pull/30796/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | `68.57% <87.50%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more. [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
jbsabbagh commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556092498 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,8 +525,11 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over +LOG.warn( +"No messages retrieved with polling timeout {} seconds. Consider increasing the consumer polling timeout using withConsumerPollingTimeout method.", +consumerPollingTimeout.getSeconds()); Review Comment: Whoops - sorry I missed this. I think this warning should be emitted only when `rawRecords` is empty. There are still cases where the timeout is over and `rawRecords` is not empty ```suggestion if (rawRecords.isEmpty()) { LOG.warn( "No messages retrieved with polling timeout {} seconds. Consider increasing the consumer polling timeout using withConsumerPollingTimeout method.", consumerPollingTimeout.getSeconds()); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]
dependabot[bot] opened a new pull request, #30888: URL: https://github.com/apache/beam/pull/30888 Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 to 4.6.3. Changelog Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's changelog. Changelog Changes in Version 4.7 PyMongo 4.7 brings a number of improvements including: Added the :class:pymongo.hello.Hello.connection_id, :attr:pymongo.monitoring.CommandStartedEvent.server_connection_id, :attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id, and :attr:pymongo.monitoring.CommandFailedEvent.server_connection_id properties. Fixed a bug where inflating a :class:~bson.raw_bson.RawBSONDocument containing a :class:~bson.code.Code would cause an error. Significantly improved the performance of encoding BSON documents to JSON. Support for named KMS providers for client side field level encryption. Previously supported KMS providers were only: aws, azure, gcp, kmip, and local. The KMS provider is now expanded to support name suffixes (e.g. local:myname). Named KMS providers enables more than one of each KMS provider type to be configured. See the docstring for :class:~pymongo.encryption_options.AutoEncryptionOpts. Note that named KMS providers requires pymongocrypt =1.9 and libmongocrypt =1.9. :meth:~pymongo.encryption.ClientEncryption.encrypt and :meth:~pymongo.encryption.ClientEncryption.encrypt_expression now allow key_id to be passed in as a :class:uuid.UUID. Fixed a bug where :class:~bson.int64.Int64 instances could not always be encoded by orjson_. The following now works:: import orjson from bson import json_util orjson.dumps({'a': Int64(1)}, default=json_util.default, option=orjson.OPT_PASSTHROUGH_SUBCLASS) .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson Fixed a bug appearing in Python 3.12 where RuntimeError: can't create new thread at interpreter shutdown could be written to stderr when a MongoClient's thread starts as the python interpreter is shutting down. Added a warning when connecting to DocumentDB and CosmosDB clusters. For more information regarding feature compatibility and support please visit mongodb.com/supportability/documentdb https://mongodb.com/supportability/documentdb;_ and mongodb.com/supportability/cosmosdb https://mongodb.com/supportability/cosmosdb;_. Added the :attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration, :attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, and :attr:pymongo.monitoring.ConnectionReadyEvent.duration properties. Added the type and kwargs arguments to :class:~pymongo.operations.SearchIndexModel to enable creating vector search indexes in MongoDB Atlas. Fixed a bug where read_concern and write_concern were improperly added to :meth:~pymongo.collection.Collection.list_search_indexes queries. Unavoidable breaking changes ... (truncated) Commits https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f BUMP 4.6.3 https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564) https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3 BUMP to 4.6.3.dev0 See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop
[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py310 [beam]
dependabot[bot] opened a new pull request, #30889: URL: https://github.com/apache/beam/pull/30889 Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 to 4.6.3. Changelog Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's changelog. Changelog Changes in Version 4.7 PyMongo 4.7 brings a number of improvements including: Added the :class:pymongo.hello.Hello.connection_id, :attr:pymongo.monitoring.CommandStartedEvent.server_connection_id, :attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id, and :attr:pymongo.monitoring.CommandFailedEvent.server_connection_id properties. Fixed a bug where inflating a :class:~bson.raw_bson.RawBSONDocument containing a :class:~bson.code.Code would cause an error. Significantly improved the performance of encoding BSON documents to JSON. Support for named KMS providers for client side field level encryption. Previously supported KMS providers were only: aws, azure, gcp, kmip, and local. The KMS provider is now expanded to support name suffixes (e.g. local:myname). Named KMS providers enables more than one of each KMS provider type to be configured. See the docstring for :class:~pymongo.encryption_options.AutoEncryptionOpts. Note that named KMS providers requires pymongocrypt =1.9 and libmongocrypt =1.9. :meth:~pymongo.encryption.ClientEncryption.encrypt and :meth:~pymongo.encryption.ClientEncryption.encrypt_expression now allow key_id to be passed in as a :class:uuid.UUID. Fixed a bug where :class:~bson.int64.Int64 instances could not always be encoded by orjson_. The following now works:: import orjson from bson import json_util orjson.dumps({'a': Int64(1)}, default=json_util.default, option=orjson.OPT_PASSTHROUGH_SUBCLASS) .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson Fixed a bug appearing in Python 3.12 where RuntimeError: can't create new thread at interpreter shutdown could be written to stderr when a MongoClient's thread starts as the python interpreter is shutting down. Added a warning when connecting to DocumentDB and CosmosDB clusters. For more information regarding feature compatibility and support please visit mongodb.com/supportability/documentdb https://mongodb.com/supportability/documentdb;_ and mongodb.com/supportability/cosmosdb https://mongodb.com/supportability/cosmosdb;_. Added the :attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration, :attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, and :attr:pymongo.monitoring.ConnectionReadyEvent.duration properties. Added the type and kwargs arguments to :class:~pymongo.operations.SearchIndexModel to enable creating vector search indexes in MongoDB Atlas. Fixed a bug where read_concern and write_concern were improperly added to :meth:~pymongo.collection.Collection.list_search_indexes queries. Unavoidable breaking changes ... (truncated) Commits https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f BUMP 4.6.3 https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564) https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3 BUMP to 4.6.3.dev0 See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop
[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]
dependabot[bot] opened a new pull request, #30890: URL: https://github.com/apache/beam/pull/30890 Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 to 4.6.3. Changelog Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's changelog. Changelog Changes in Version 4.7 PyMongo 4.7 brings a number of improvements including: Added the :class:pymongo.hello.Hello.connection_id, :attr:pymongo.monitoring.CommandStartedEvent.server_connection_id, :attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id, and :attr:pymongo.monitoring.CommandFailedEvent.server_connection_id properties. Fixed a bug where inflating a :class:~bson.raw_bson.RawBSONDocument containing a :class:~bson.code.Code would cause an error. Significantly improved the performance of encoding BSON documents to JSON. Support for named KMS providers for client side field level encryption. Previously supported KMS providers were only: aws, azure, gcp, kmip, and local. The KMS provider is now expanded to support name suffixes (e.g. local:myname). Named KMS providers enables more than one of each KMS provider type to be configured. See the docstring for :class:~pymongo.encryption_options.AutoEncryptionOpts. Note that named KMS providers requires pymongocrypt =1.9 and libmongocrypt =1.9. :meth:~pymongo.encryption.ClientEncryption.encrypt and :meth:~pymongo.encryption.ClientEncryption.encrypt_expression now allow key_id to be passed in as a :class:uuid.UUID. Fixed a bug where :class:~bson.int64.Int64 instances could not always be encoded by orjson_. The following now works:: import orjson from bson import json_util orjson.dumps({'a': Int64(1)}, default=json_util.default, option=orjson.OPT_PASSTHROUGH_SUBCLASS) .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson Fixed a bug appearing in Python 3.12 where RuntimeError: can't create new thread at interpreter shutdown could be written to stderr when a MongoClient's thread starts as the python interpreter is shutting down. Added a warning when connecting to DocumentDB and CosmosDB clusters. For more information regarding feature compatibility and support please visit mongodb.com/supportability/documentdb https://mongodb.com/supportability/documentdb;_ and mongodb.com/supportability/cosmosdb https://mongodb.com/supportability/cosmosdb;_. Added the :attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration, :attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, and :attr:pymongo.monitoring.ConnectionReadyEvent.duration properties. Added the type and kwargs arguments to :class:~pymongo.operations.SearchIndexModel to enable creating vector search indexes in MongoDB Atlas. Fixed a bug where read_concern and write_concern were improperly added to :meth:~pymongo.collection.Collection.list_search_indexes queries. Unavoidable breaking changes ... (truncated) Commits https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f BUMP 4.6.3 https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564) https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3 BUMP to 4.6.3.dev0 See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop
Re: [PR] Create YAML Join Transform [beam]
Polber commented on code in PR #30734: URL: https://github.com/apache/beam/pull/30734#discussion_r1556253586 ## sdks/python/apache_beam/yaml/yaml_join_test.py: ## @@ -0,0 +1,210 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import logging +import unittest + +import apache_beam as beam +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.yaml.yaml_transform import YamlTransform + + +class ToRow(beam.PTransform): + def expand(self, pcoll): +return pcoll | beam.Map(lambda row: beam.Row(**row._asdict())) + + +FRUITS = [ +beam.Row(id=1, name='raspberry'), +beam.Row(id=2, name='blackberry'), +] + +QUANTITIES = [ +beam.Row(name='raspberry', quantity=1), +beam.Row(name='blackberry', quantity=2), +beam.Row(name='blueberry', quantity=3), +] + +CATEGORIES = [ +beam.Row(name='raspberry', category='juicy'), +beam.Row(name='blackberry', category='dry'), +beam.Row(name='blueberry', category='dry'), +beam.Row(name='blueberry', category='juicy'), +] + + +class YamlJoinTest(unittest.TestCase): Review Comment: Try adding the following annotation to skip this suite in pre-commits. It looks like the pre-commits do not have snapshot jars for the gradle providers. ```suggestion @unittest.skipIf( TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is None, 'Do not run this test on precommit suites.') class YamlJoinTest(unittest.TestCase): ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]
dariabezkorovaina commented on PR #30873: URL: https://github.com/apache/beam/pull/30873#issuecomment-2043233424 Hi @damccorm, this is the final PR in these Duet AI prompts series: we've created the "no links" versions of all the "documentation lookup" prompts, as well as added some nits and polishing edits. Please take a look :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]
github-actions[bot] commented on PR #30873: URL: https://github.com/apache/beam/pull/30873#issuecomment-2043255437 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @jrmccluskey added as fallback since no labels match configuration Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
github-actions[bot] commented on PR #30877: URL: https://github.com/apache/beam/pull/30877#issuecomment-2043255243 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @bvolpato for label java. R: @bvolpato for label io. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add PubSubIO Stress test [beam]
akashorabek commented on PR #30886: URL: https://github.com/apache/beam/pull/30886#issuecomment-2043315705 R: @Abacn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]
tvalentyn closed pull request #30888: Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 URL: https://github.com/apache/beam/pull/30888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]
dependabot[bot] commented on PR #30888: URL: https://github.com/apache/beam/pull/30888#issuecomment-2043395144 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add terraform for utility cluster. Add name override to gke [beam]
volatilemolotov commented on code in PR #30847: URL: https://github.com/apache/beam/pull/30847#discussion_r1555438203 ## .test-infra/terraform/google-cloud-platform/google-kubernetes-engine/cluster.tf: ## @@ -34,7 +34,9 @@ resource "google_container_cluster" "default" { enable_private_nodes= true enable_private_endpoint = false } - node_config { -service_account = data.google_service_account.default.email + cluster_autoscaling { Review Comment: Forgive me not being clear but code from my comment is fine now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] add terraform for utility cluster. Add name override to gke [beam]
volatilemolotov commented on code in PR #30847: URL: https://github.com/apache/beam/pull/30847#discussion_r1555483141 ## .test-infra/terraform/google-cloud-platform/google-kubernetes-engine/cluster.tf: ## @@ -34,7 +34,9 @@ resource "google_container_cluster" "default" { enable_private_nodes= true enable_private_endpoint = false } - node_config { -service_account = data.google_service_account.default.email + cluster_autoscaling { Review Comment: @damondouglas Actually putting it into auto_provisioning_defaults is the way to go. I guess I might have been referencing older documentation. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]
jto commented on PR #30802: URL: https://github.com/apache/beam/pull/30802#issuecomment-2042134252 Sure. I tested it on a job that consumes ~1B records (~150GB). With the Dataset API, runtime is 37min. Passing `--useDataStreamForBatch`, I killed it after 1h+ as it was clearly too slow. I tried different bundle size settings with this path and got the following execution times: | Bundle size | execution time (min) | | --- | --: | | 50,000 | 44 | | 100,000 | 37 | | 1,000,000 | 29 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
xianhualiu commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556357679 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,8 +525,11 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over +LOG.warn( +"No messages retrieved with polling timeout {} seconds. Consider increasing the consumer polling timeout using withConsumerPollingTimeout method.", +consumerPollingTimeout.getSeconds()); Review Comment: hmm, if rawRecords is not empty, it should already returned before at: if (!rawRecords.isEmpty()) { // return as we have found some entries return rawRecords; } -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Failing Test]: Python Coverage is failing because of new version of typing-extension release [beam]
riteshghorse closed issue #30806: [Failing Test]: Python Coverage is failing because of new version of typing-extension release URL: https://github.com/apache/beam/issues/30806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Revert "Revert #30533: Automatically execute unbounded pipelines in streaming mode." [beam]
damccorm opened a new pull request, #30894: URL: https://github.com/apache/beam/pull/30894 Reverts apache/beam#30706 This change was initially correct -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Windowing Support for the DaskRunner [beam]
cisaacstern commented on PR #27618: URL: https://github.com/apache/beam/pull/27618#issuecomment-2043644883 The necessary upstream fix in Dask was merged! Once we get a new Dask release that includes this change, I will finish this PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Initial Iceberg Sink [beam]
chamikaramj commented on code in PR #30797: URL: https://github.com/apache/beam/pull/30797#discussion_r1556356489 ## sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/AppendFilesToTables.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.iceberg; + +import org.apache.beam.sdk.coders.KvCoder; +import org.apache.beam.sdk.coders.SerializableCoder; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.GroupByKey; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.WithKeys; +import org.apache.beam.sdk.transforms.windowing.BoundedWindow; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.Snapshot; +import org.apache.iceberg.Table; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.checkerframework.checker.nullness.qual.MonotonicNonNull; + +class AppendFilesToTables +extends PTransform, PCollection>> { + + private final IcebergCatalogConfig catalogConfig; + + AppendFilesToTables(IcebergCatalogConfig catalogConfig) { +this.catalogConfig = catalogConfig; + } + + @Override + public PCollection> expand(PCollection writtenFiles) { + +// Apply any sharded writes and flatten everything for catalog updates +return writtenFiles +.apply( +"Key metadata updates by table", +WithKeys.of( +new SerializableFunction() { + @Override + public String apply(FileWriteResult input) { +return input.getTableIdentifier().toString(); + } +})) +// .setCoder(KvCoder.of(StringUtf8Coder.of(), new MetadataUpdate.MetadataUpdateCoder())) +.apply("Group metadata updates by table", GroupByKey.create()) +.apply( +"Append metadata updates to tables", +ParDo.of(new AppendFilesToTablesDoFn(catalogConfig))) +.setCoder(KvCoder.of(StringUtf8Coder.of(), SerializableCoder.of(Snapshot.class))); + } + + private static class AppendFilesToTablesDoFn + extends DoFn>, KV> { + +private final IcebergCatalogConfig catalogConfig; + +private transient @MonotonicNonNull Catalog catalog; + +private AppendFilesToTablesDoFn(IcebergCatalogConfig catalogConfig) { + this.catalogConfig = catalogConfig; +} + +private Catalog getCatalog() { + if (catalog == null) { +catalog = catalogConfig.catalog(); + } + return catalog; +} + +@ProcessElement +public void processElement( +@Element KV> element, +OutputReceiver> out, +BoundedWindow window) { + Table table = getCatalog().loadTable(TableIdentifier.parse(element.getKey())); + AppendFiles update = table.newAppend(); + for (FileWriteResult writtenFile : element.getValue()) { +update.appendFile(writtenFile.getDataFile()); + } + update.commit(); Review Comment: Yeah, (2) is fine. It's more about making sure that we don't double write if a work item fails. But if writing is idempotent it's simpler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Failing Test]: Python Coverage is failing because of new version of typing-extension release [beam]
riteshghorse commented on issue #30806: URL: https://github.com/apache/beam/issues/30806#issuecomment-2043532181 https://github.com/apache/beam/pull/30863 fixed it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]
dmitryor commented on PR #30672: URL: https://github.com/apache/beam/pull/30672#issuecomment-2043539547 I rebased to recent `main` and tests are passing now. I suspect the previous failure was a flake. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
liferoad commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556372231 ## CHANGES.md: ## @@ -72,6 +72,7 @@ ## Breaking Changes * X behavior was changed ([#X](https://github.com/apache/beam/issues/X)). +* Default consumer polling timeout for KafkaIO.Read was increased from 1 second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration duration) to configure this timeout value when necessary ([#30870](https://github.com/apache/beam/pull/30877)). Review Comment: Can you link this to the issue? not mixing issue and PR here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]
damccorm merged PR #30873: URL: https://github.com/apache/beam/pull/30873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]
brachipa commented on issue #30498: URL: https://github.com/apache/beam/issues/30498#issuecomment-2042540303 Ok, I think I find what cause it. calcite checks if expression node is equal to row fields https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2068C5-L2069C63 using RexUtil.isIdentity method: ``` public static boolean isIdentity(List exps, RelDataType inputRowType) { return inputRowType.getFieldCount() == exps.size() && containIdentity(exps, inputRowType, Litmus.IGNORE); } ``` In the failing example, we have row with wider schema from what we actually select. most of our schemas has much more data from what we select, identifying the row as not identical row causes it to create a project with fields as they appear in the select , meaning with their alias and not with their origin field name. And then it is ignored in the "rename" method later on https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2130 and alias is skipped https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2142 I believe the `isIdentity` check can cause more issues, and we must understand why this is enforced? isn't it valid to have different size of fields in select from what we have in the schema? In our case we have one big row and we run on it different queries, each with different fields in the select. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Clean doc related to write data in bigquery.py [beam]
github-actions[bot] commented on PR #30887: URL: https://github.com/apache/beam/pull/30887#issuecomment-2042578339 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @liferoad for label python. R: @chamikaramj for label io. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement Web API connector interfaces [beam]
github-actions[bot] commented on PR #30815: URL: https://github.com/apache/beam/pull/30815#issuecomment-2042589105 Reminder, please take a look at this pr: @lostluck @Abacn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Implement Web API connector interfaces [beam]
codecov-commenter commented on PR #30815: URL: https://github.com/apache/beam/pull/30815#issuecomment-2042591781 ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) Report Attention: Patch coverage is `60.97561%` with `32 lines` in your changes are missing coverage. Please review. > Project coverage is 70.72%. Comparing base [(`fc5df6f`)](https://app.codecov.io/gh/apache/beam/commit/fc5df6f261bbe6ea910ffc1de8e6c093c9751c60?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) to head [(`b8e8747`)](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). > Report is 88 commits behind head on master. | [Files](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Patch % | Lines | |---|---|---| | [sdks/go/pkg/beam/io/webapi/webapi.go](https://app.codecov.io/gh/apache/beam/pull/30815?src=pr=tree=sdks%2Fgo%2Fpkg%2Fbeam%2Fio%2Fwebapi%2Fwebapi.go_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-c2Rrcy9nby9wa2cvYmVhbS9pby93ZWJhcGkvd2ViYXBpLmdv) | 60.97% | [25 Missing and 7 partials :warning: ](https://app.codecov.io/gh/apache/beam/pull/30815?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #30815 +/- ## - Coverage 70.73% 70.72% -0.02% Complexity 4468 4468 Files 1256 1257 +1 Lines140774 140856 +82 Branches 4306 4306 + Hits 9958199622 +41 - Misses3771437745 +31 - Partials 3479 3489 +10 ``` | [Flag](https://app.codecov.io/gh/apache/beam/pull/30815/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | Coverage Δ | | |---|---|---| | [go](https://app.codecov.io/gh/apache/beam/pull/30815/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache) | `54.34% <60.97%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more. [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Typescript] SDK fixes [beam]
github-actions[bot] closed pull request #29889: [Typescript] SDK fixes URL: https://github.com/apache/beam/pull/29889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Typescript] SDK fixes [beam]
github-actions[bot] commented on PR #29889: URL: https://github.com/apache/beam/pull/29889#issuecomment-2042643912 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bump github.com/aws/smithy-go from 1.20.1 to 1.20.2 in /sdks [beam]
jrmccluskey merged PR #30883: URL: https://github.com/apache/beam/pull/30883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Initial Iceberg Sink [beam]
chamikaramj commented on code in PR #30797: URL: https://github.com/apache/beam/pull/30797#discussion_r1556446483 ## sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/RecordWriter.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.io.iceberg; + +import static org.apache.beam.io.iceberg.RowHelper.rowToRecord; + +import java.io.IOException; +import org.apache.beam.sdk.values.Row; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Table; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.data.Record; +import org.apache.iceberg.data.parquet.GenericParquetWriter; +import org.apache.iceberg.io.DataWriter; +import org.apache.iceberg.io.OutputFile; +import org.apache.iceberg.orc.ORC; +import org.apache.iceberg.parquet.Parquet; + +class RecordWriter { + + private final DataWriter icebergDataWriter; + + private final Table table; + + RecordWriter(Catalog catalog, IcebergDestination destination, String filename) + throws IOException { +this( +catalog.loadTable(destination.getTableIdentifier()), destination.getFileFormat(), filename); + } + + RecordWriter(Table table, FileFormat fileFormat, String filename) throws IOException { +this.table = table; + +String absoluteFilename = table.location() + "/" + filename; +OutputFile outputFile = table.io().newOutputFile(absoluteFilename); +switch (fileFormat) { + case AVRO: +icebergDataWriter = +Avro.writeData(outputFile) + .createWriterFunc(org.apache.iceberg.data.avro.DataWriter::create) +.schema(table.schema()) +.withSpec(table.spec()) +.overwrite() +.build(); +break; + case PARQUET: +icebergDataWriter = +Parquet.writeData(outputFile) +.createWriterFunc(GenericParquetWriter::buildWriter) +.schema(table.schema()) +.withSpec(table.spec()) +.overwrite() +.build(); +break; + case ORC: +throw new UnsupportedOperationException("ORC file format not currently supported."); +//icebergDataWriter = Review Comment: Delete ? ## settings.gradle.kts: ## @@ -355,3 +355,7 @@ include("sdks:java:io:kafka:kafka-01103") findProject(":sdks:java:io:kafka:kafka-01103")?.name = "kafka-01103" include("sdks:java:managed") findProject(":sdks:java:managed")?.name = "managed" +include("sdks:java:io:iceberg") +findProject(":sdks:java:io:iceberg")?.name = "iceberg" +include("sdks:java:io:catalog") Review Comment: It doesn't look like we add anything under "sdks:java:io:catalog". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]
pabloem commented on PR #30869: URL: https://github.com/apache/beam/pull/30869#issuecomment-2043691501 is this good to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [yaml] backtick generated aliases on sql mappings [beam]
Polber opened a new pull request, #30895: URL: https://github.com/apache/beam/pull/30895 There are cases where `MapToFields` does not work for sql expressions where a field in the incoming schema is a reserved keyword. For example, with an input schema of `{foo: int, timestamp: str}` The following would fail: ``` - type: MapToFields config: language: sql append: true fields: newField: foo ``` as the generated query is `SELECT (timestamp) AS timestamp, (foo) AS foo, (foo) AS newField FROM PCOLLECTION.` where `(timestamp) AS timestamp` was generated due to `append: true` being set. --- This PR provides a fix by auto-escaping aliases that are generated by the framework such as the example above, or when naming a field, i.e. ``` - type: MapToFields name: Transform config: language: sql fields: timestamp: foo ``` Note: The expression itself will not be auto-backticked, so the following is still expected to fail: ``` - type: MapToFields config: language: sql fields: foo: timestamp ``` a user would be expected to write valid SQL, i.e. ``` - type: MapToFields config: language: sql fields: foo: "`timestamp`" ``` Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]
github-actions[bot] commented on PR #30895: URL: https://github.com/apache/beam/pull/30895#issuecomment-2043738129 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] add use-case example [beam]
liferoad commented on code in PR #30896: URL: https://github.com/apache/beam/pull/30896#discussion_r1556503761 ## sdks/python/apache_beam/yaml/examples/simple_filter_and_combine.yaml: ## @@ -0,0 +1,56 @@ +# coding=utf-8 +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the# Row(word='License'); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an# Row(word='AS IS' BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The following example reads mock transaction data from resources/products.csv, +# performs a simple filter for "Electronics", then calculates the revenue and +# number of products sold for each product type. +pipeline: + transforms: +- type: ReadFromCsv + name: ReadInputFile + config: +path: resources/products.csv +- type: Filter + name: FilterWithCategory + input: ReadInputFile + config: +language: python +keep: category == "Electronics" +- type: Combine + name: CountNumberSold + input: FilterWithCategory + config: +group_by: product_name +combine: + num_sold: +value: product_name +fn: count + total_revenue: +value: price +fn: sum +- type: WriteToCsv + name: WriteOutputFile + input: CountNumberSold + config: +path: output + +options: + yaml_experimental_features: Combine + +# Expected: Review Comment: [README.md](https://github.com/apache/beam/blob/9add040efc69c516b5462668ac923979cc38a6d0/sdks/python/apache_beam/yaml/examples/README.md) can we update this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] add use-case example [beam]
liferoad commented on code in PR #30896: URL: https://github.com/apache/beam/pull/30896#discussion_r1556504460 ## sdks/python/apache_beam/yaml/examples/resources/products.csv: ## @@ -0,0 +1,6 @@ +transaction_id,product_name,category,price +T0012,Headphones,Electronics,59.99 +T5034,Leather Jacket,Apparel,109.99 +T0024,Aluminum Mug,Kitchen,29.99 +T0104,Headphones,Electronics,59.99 +T0302,Monitor,Electronics,249.99 Review Comment: do we need to this csv file? can we just create a temp csv for your example? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]
liferoad commented on code in PR #30895: URL: https://github.com/apache/beam/pull/30895#discussion_r1556512365 ## sdks/python/apache_beam/yaml/yaml_mapping.py: ## @@ -515,7 +515,7 @@ def normalize_fields(pcoll, fields, drop=(), append=False, language='generic'): if append: return input_schema, { -**{name: name +**{name: f'`{name}`' if language == 'sql' else name Review Comment: looks like these issues should be easily caught with some tests. So can we add more tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]
Polber commented on PR #30895: URL: https://github.com/apache/beam/pull/30895#issuecomment-2043737188 R: @damccorm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]
Polber commented on PR #30895: URL: https://github.com/apache/beam/pull/30895#issuecomment-2043736968 R: @robertwb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [yaml] add use-case example [beam]
Polber opened a new pull request, #30896: URL: https://github.com/apache/beam/pull/30896 Adds 2 simple use-case examples to examples catalog. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] add use-case example [beam]
Polber commented on PR #30896: URL: https://github.com/apache/beam/pull/30896#issuecomment-2043759026 R: @liferoad -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] add use-case example [beam]
Polber commented on PR #30896: URL: https://github.com/apache/beam/pull/30896#issuecomment-2043759408 @robertwb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] add use-case example [beam]
github-actions[bot] commented on PR #30896: URL: https://github.com/apache/beam/pull/30896#issuecomment-2043759925 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] add use-case example [beam]
liferoad commented on code in PR #30896: URL: https://github.com/apache/beam/pull/30896#discussion_r1556506443 ## sdks/python/apache_beam/yaml/examples/simple_filter.yaml: ## @@ -0,0 +1,41 @@ +# coding=utf-8 Review Comment: are these yaml files packaged when releasing beam? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [yaml] remove Combine from yaml_experimental_features [beam]
Polber opened a new pull request, #30897: URL: https://github.com/apache/beam/pull/30897 Removes the need to set `--yaml_experimental_features=Combine` to run aggregations in Beam YAML Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]
tvalentyn commented on PR #30771: URL: https://github.com/apache/beam/pull/30771#issuecomment-2043802447 pydantic errors should be resolved now. ptal at linter errors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [yaml] Add Beam YAML Blog [beam]
Polber opened a new pull request, #30898: URL: https://github.com/apache/beam/pull/30898 Adds a Beam YAML announcement blog Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Revert "Revert #30533: Automatically execute unbounded pipelines in streaming mode." [beam]
liferoad commented on code in PR #30894: URL: https://github.com/apache/beam/pull/30894#discussion_r1556620836 ## sdks/python/apache_beam/runners/dataflow/dataflow_runner.py: ## @@ -415,6 +416,12 @@ def run_pipeline(self, pipeline, options, pipeline_proto=None): self.proto_pipeline, self.proto_context = pipeline.to_runner_api( return_context=True, default_environment=self._default_environment) +if any(pcoll.is_bounded == beam_runner_api_pb2.IsBounded.UNBOUNDED + for pcoll in self.proto_pipeline.components.pcollections.values()): + options.view_as(StandardOptions).streaming = True Review Comment: if users specify streaming, can we honor the one specified by the users? This at least can let users keep the current behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] Add Beam YAML Blog [beam]
github-actions[bot] commented on PR #30898: URL: https://github.com/apache/beam/pull/30898#issuecomment-2043944172 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [yaml] remove Combine from yaml_experimental_features [beam]
github-actions[bot] commented on PR #30897: URL: https://github.com/apache/beam/pull/30897#issuecomment-2043944210 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @jrmccluskey for label python. R: @liferoad for label website. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
jbsabbagh commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1555950658 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -191,6 +191,12 @@ private ReadFromKafkaDoFn( this.checkStopReadingFn = transform.getCheckStopReadingFn(); this.badRecordRouter = transform.getBadRecordRouter(); this.recordTag = recordTag; +if (transform.getConsumerPollingTimeout() != null) { + this.consumerPollingTimeout = + java.time.Duration.ofMillis(transform.getConsumerPollingTimeout().getMillis()); +} else { + this.consumerPollingTimeout = KAFKA_POLL_TIMEOUT; +} Review Comment: Maybe we could add some sort of log here that mentions what the timeout is? This will help signal to users that a timeout outside of the Kafka Consumer configs exists. ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,7 +525,7 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over return rawRecords; Review Comment: In the case where the `rawRecords.isEmpty()`, might be worth adding some sort of count that tracks how many times `rawRecords` has been empty. If this count is too high, maybe emit a warning log that it is happening too often and that perhaps they should increase the poll timeout ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java: ## @@ -587,6 +587,7 @@ public static Read read() { .setCommitOffsetsInFinalizeEnabled(false) .setDynamicRead(false) .setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime()) +.setConsumerPollingTimeout(Duration.standardSeconds(1L)) Review Comment: Should the default be increased slightly? Maybe 2 seconds? Given that this is even causing issues in the US to US cross-regional reads, it does seem to suggest that 1 second might be a bit too aggressive as a general default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
liferoad commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1555976984 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java: ## @@ -587,6 +587,7 @@ public static Read read() { .setCommitOffsetsInFinalizeEnabled(false) .setDynamicRead(false) .setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime()) +.setConsumerPollingTimeout(Duration.standardSeconds(1L)) Review Comment: This makes sense. @xianhualiu please update https://github.com/apache/beam/blob/master/CHANGES.md as well about your fix, since this should be called out in our release notes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
jbsabbagh commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1555959733 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,7 +525,7 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over return rawRecords; Review Comment: In the case where the `rawRecords.isEmpty()`, might be worth adding some sort of count that tracks how many times `rawRecords` has been empty. If this count is too high, maybe emit a warning log that it is happening too often and that perhaps the user should increase the poll timeout -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
xianhualiu commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556018716 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,7 +525,7 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over return rawRecords; Review Comment: added warning when no messages polled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
xianhualiu commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556019131 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java: ## @@ -587,6 +587,7 @@ public static Read read() { .setCommitOffsetsInFinalizeEnabled(false) .setDynamicRead(false) .setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime()) +.setConsumerPollingTimeout(Duration.standardSeconds(1L)) Review Comment: increased to 2 seconds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Patch release website changes [beam]
damccorm merged PR #30839: URL: https://github.com/apache/beam/pull/30839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Bug]: Cannot read from Kafka due to short poll timeout of consumer in KafkaIO [beam]
jbsabbagh commented on issue #30870: URL: https://github.com/apache/beam/issues/30870#issuecomment-2042957025 This also affects the Python SDK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
jbsabbagh commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1555959733 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -518,7 +525,7 @@ private ConsumerRecords poll( return rawRecords; } elapsed = sw.elapsed(); - if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) { + if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) { // timeout is over return rawRecords; Review Comment: In the case where the `rawRecords.isEmpty()` and the timeout is over, might be worth adding some sort of count that tracks how many times `rawRecords` has been empty. If this count is too high, maybe emit a warning log that it is happening too often and that perhaps the user should increase the poll timeout -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]
xianhualiu commented on code in PR #30877: URL: https://github.com/apache/beam/pull/30877#discussion_r1556016519 ## sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java: ## @@ -191,6 +191,12 @@ private ReadFromKafkaDoFn( this.checkStopReadingFn = transform.getCheckStopReadingFn(); this.badRecordRouter = transform.getBadRecordRouter(); this.recordTag = recordTag; +if (transform.getConsumerPollingTimeout() != null) { + this.consumerPollingTimeout = + java.time.Duration.ofMillis(transform.getConsumerPollingTimeout().getMillis()); +} else { + this.consumerPollingTimeout = KAFKA_POLL_TIMEOUT; +} Review Comment: The KafkaIO.Read API doc already contains the java doc for this new configuration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [Feature Request]: Managed transforms Java API [beam]
ahmedabu98 closed issue #30830: [Feature Request]: Managed transforms Java API URL: https://github.com/apache/beam/issues/30830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Clean doc related to write data in bigquery.py [beam]
kevinzous opened a new pull request, #30887: URL: https://github.com/apache/beam/pull/30887 * add missing closing parenthesis * add unique names to PTransform operations **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Add PubSubIO Stress test [beam]
akashorabek opened a new pull request, #30886: URL: https://github.com/apache/beam/pull/30886 This pull request introduces stress tests for PubSubIO, designed to assess the performance under various conditions. The stress tests simulate dynamic load increases and evaluate the behavior of PubSubIO. Changes: Added stress tests for PubSubIO. Implemented dynamic load increases over time to simulate varying workloads. Added support for exporting metrics to InfluxDB or BigQuery based on the configuration parameter. [Write job example](https://console.cloud.google.com/dataflow/jobs/us-central1/2024-04-08_02_09_14-10513489467703410835;step=Write%20to%20PubSub;mainTab=JOB_GRAPH;bottomTab=WORKER_LOGS;bottomStepTab=DATA_SAMPLING;logsSeverity=ERROR;graphView=0?project=apache-beam-testing) [Read job example](https://console.cloud.google.com/dataflow/jobs/us-central1/2024-04-08_02_09_57-14737073549262708363;step=Counting%20element;mainTab=JOB_GRAPH;bottomTab=JOB_LOGS;bottomStepTab=DATA_SAMPLING;logsSeverity=INFO;graphView=0?project=apache-beam-testing=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))) Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Java] ManagedIO [beam]
ahmedabu98 merged PR #30808: URL: https://github.com/apache/beam/pull/30808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]
andreydevyatkin commented on code in PR #30873: URL: https://github.com/apache/beam/pull/30873#discussion_r1555991358 ## learning/prompts/documentation-lookup-nolinks/09_basic_triggers.md: ## @@ -0,0 +1,30 @@ +Prompt: +What is a trigger in Apache Beam? + +Response: +Apache Beam uses triggers to determine when to emit the aggregated results of each window, referred to as a pane. + +Triggers provide two additional capabilities compared to outputting at the end of a window: +- Allowing early results to be output before the end of the window. +- Allowing late data to be handled after the end of the window. + +This feature lets you control the flow of your data and balance between completeness, latency, and cost. + +You set the triggers for a `PCollection` by setting the `trigger` parameter of the `WindowInto` transform. + +```python + pcollection | WindowInto( +FixedWindows(1 * 60), +trigger=AfterProcessingTime(1 * 60), +accumulation_mode=AccumulationMode.DISCARDING) Review Comment: ```suggestion pcollection | WindowInto( FixedWindows(1 * 60), trigger=AfterProcessingTime(1 * 60), accumulation_mode=AccumulationMode.DISCARDING, ) ``` ## learning/prompts/documentation-lookup-nolinks/25_io_csv.md: ## @@ -0,0 +1,29 @@ +Prompt: +Is CSV format supported in Apache Beam? + +Response: +CSV is a data format for storing and exchanging data. Apache Beam supports reading data from and writing data to CSV files using the `ReadFromText` and `WriteToText` transforms in the `TextIO` module. For more information, see the TextIO connector documentation for your programming language of choice. + +To get started with CSV and Apache Beam, refer to the Dataflow Cookbook GitHub repository. + +Here is an example of Apache Beam pipeline code for reading data from a CSV file: + +```python +class CsvOptions(PipelineOptions): +@classmethod +def _add_argparse_args(cls, parser): +parser.add_argument( +'--file_path', +default="gs://your-bucket/your-file.csv", +help='Csv file path' +) + +options = CsvOptions() + +with beam.Pipeline(options=options) as p: +output = (p | "Read from Csv file" >> ReadFromCsv( +path=options.file_path +) +| "Log Data" >> Map(logging.info)) + Review Comment: ```suggestion class CsvOptions(PipelineOptions): @classmethod def _add_argparse_args(cls, parser): parser.add_argument( "--file_path", default="gs://your-bucket/your-file.csv", help="Csv file path", ) options = CsvOptions() with beam.Pipeline(options=options) as p: output = ( p | "Read from Csv file" >> ReadFromCsv(path=options.file_path) | "Log Data" >> Map(logging.info) ) ``` ## learning/prompts/documentation-lookup-nolinks/20_io_biguery.md: ## @@ -0,0 +1,41 @@ +Prompt: +Is BigQuery supported in Apache Beam? + +Response: +Yes, Apache Beam supports BigQuery. BigQuery is a serverless and cost-effective enterprise data warehouse offered by Google Cloud. Apache Beam provides a BigQueryIO connector to read and write data from and to BigQuery. The BigQueryIO connector supports both batch and streaming pipelines. + +The following Apache Beam SDKs support the BigQueryIO connector: +* Java (natively) +* Python (natively) +* Go (natively and through X Language) +* Typescript (through X Language) + +To read data from BigQuery, use the `ReadFromBigQuery` function. Apache Beam can read data directly from a BigQuery table or using an SQL query. The default mode is to return table rows read from a BigQuery source as dictionaries. Built-in `TableRow` objects can also be returned. + +Here is an example of Apache Beam pipeline code for reading from BigQuery: + +```python +from apache_beam.io.gcp.bigquery import ReadFromBigQuery + +with beam.Pipeline(options=options) as p: + # read from a table +lines_table = p | 'Read' >> ReadFromBigQuery(table=table) + # read from a query +lines_query = p | 'Read' >> ReadFromBigQuery(query="SELECT * FROM table") +``` + +Here is an example of Apache Beam pipeline code for writing to BigQuery: + +```python +from apache_beam.io.gcp.bigquery import WriteToBigQuery + +with beam.Pipeline(options=options) as p: + # write to a table +p | 'Write' >> beam.io.WriteToBigQuery( +table, +schema=TABLE_SCHEMA, +create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, +write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND) Review Comment: ```suggestion from apache_beam.io.gcp.bigquery import WriteToBigQuery with beam.Pipeline(options=options) as p: # write to a table p | "Write" >> beam.io.WriteToBigQuery( table, schema=TABLE_SCHEMA,