Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.30.0 in /sdks [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30901:
URL: https://github.com/apache/beam/pull/30901#issuecomment-2044180860

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump google.golang.org/grpc from 1.62.1 to 1.63.2 in /sdks [beam]

2024-04-08 Thread via GitHub


dependabot[bot] opened a new pull request, #30900:
URL: https://github.com/apache/beam/pull/30900

   Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.62.1 
to 1.63.2.
   
   Release notes
   Sourced from https://github.com/grpc/grpc-go/releases;>google.golang.org/grpc's 
releases.
   
   Release 1.63.2
   Bugs
   
   Fix the user agent string
   
   Release 1.63.1
   Bugs
   
   grpc: fixed subchannel log messages to properly reference the parent 
channel (https://redirect.github.com/grpc/grpc-go/issues/7101;>#7101)
   
   Special thanks: https://github.com/daniel-weisse;>@​daniel-weisse
   
   
   
   API Changes
   
   grpc: remove Deprecated tag from Dial and DialContext; these will be 
deprecated in v1.64 instead (https://redirect.github.com/grpc/grpc-go/issues/7103;>#7103)
   
   Release 1.63.0
   Behavior Changes
   
   grpc: Return canonical target string from 
resolver.Address.String() (experimental) (https://redirect.github.com/grpc/grpc-go/issues/6923;>#6923)
   client  server: when using write buffer pooling, use input value 
for buffer size instead of size*2 (https://redirect.github.com/grpc/grpc-go/issues/6983;>#6983)
   
   Special Thanks: https://github.com/raghav-stripe;>@​raghav-stripe
   
   
   
   New Features
   
   grpc: add ClientConn.CanonicalTarget() to return the 
canonical target string. (https://redirect.github.com/grpc/grpc-go/issues/7006;>#7006)
   xds: implement LRS named metrics support (https://github.com/grpc/proposal/blob/master/A64-lrs-custom-metrics.md;>gRFC
 A64) (https://redirect.github.com/grpc/grpc-go/issues/7027;>#7027)
   
   Special Thanks: https://github.com/danielzhaotongliu;>@​danielzhaotongliu
   
   
   grpc: introduce grpc.NewClient to allow users to create new 
clients in idle mode and with dns as the default resolver (https://redirect.github.com/grpc/grpc-go/issues/7010;>#7010)
   
   Special Thanks: https://github.com/bruuuce;>@​bruuuce
   
   
   
   API Changes
   
   grpc: stabilize experimental method ClientConn.Target() (https://redirect.github.com/grpc/grpc-go/issues/7006;>#7006)
   
   Bug Fixes
   
   xds: fix an issue that would cause the client to send an empty list of 
resources for LDS/CDS upon reconnecting with the management server (https://redirect.github.com/grpc/grpc-go/issues/7026;>#7026)
   server: Fix some errors returned by a server when using a 
grpc.Server as an http.Handler with the Go stdlib 
HTTP server (https://redirect.github.com/grpc/grpc-go/issues/6989;>#6989)
   resolver/dns: add SetResolvingTimeout to allow configuring 
the DNS resolver's global timeout (https://redirect.github.com/grpc/grpc-go/issues/6917;>#6917)
   
   Special Thanks: https://github.com/and1truong;>@​and1truong
   
   
   Set the security level of Windows named pipes to NoSecurity (https://redirect.github.com/grpc/grpc-go/issues/6956;>#6956)
   
   Special Thanks: https://github.com/irsl;>@​irsl
   
   
   
   Release 1.62.2
   Dependencies
   
   Update http2 library to address vulnerability https://www.kb.cert.org/vuls/id/421644;>CVE-2023-45288
   
   
   
   
   Commits
   
   https://github.com/grpc/grpc-go/commit/d32e66ce27447a0a217464a36fdd3935801c0453;>d32e66c
 Change version to 1.63.2 (https://redirect.github.com/grpc/grpc-go/issues/7104;>#7104)
   https://github.com/grpc/grpc-go/commit/92f6dd0c1083430b830e51aaca0db371b06bc99e;>92f6dd0
 channelz: pass parent pointer instead of parent ID to RegisterSubChannel (https://redirect.github.com/grpc/grpc-go/issues/7101;>#7101)
   https://github.com/grpc/grpc-go/commit/0f6ef0fbe51aa33d05a91d0fa87b28113b83f5a9;>0f6ef0f
 grpc: un-deprecate Dial and DialContext
   https://github.com/grpc/grpc-go/commit/58dc74987513afa14f8d2ad1643f4b6f192d6ee8;>58dc749
 Change version to 1.63.1-dev (https://redirect.github.com/grpc/grpc-go/issues/7051;>#7051)
   https://github.com/grpc/grpc-go/commit/c68f4566b9cacdb11c42a6eb14ee66a33d9b7c12;>c68f456
 Change version to 1.63.0 (https://redirect.github.com/grpc/grpc-go/issues/7050;>#7050)
   https://github.com/grpc/grpc-go/commit/6369167ae33538aca051225fe5b5e07c2b022eb5;>6369167
 *: update http2 dependency (https://redirect.github.com/grpc/grpc-go/issues/7082;>#7082)
   https://github.com/grpc/grpc-go/commit/88547614e7427f7ce87a5f5485d897ae09044641;>8854761
 cherry-pick: channelz: fix race accessing channelMap without lock (https://redirect.github.com/grpc/grpc-go/issues/7079;>#7079) (https://redirect.github.com/grpc/grpc-go/issues/7;>#7...
   https://github.com/grpc/grpc-go/commit/e62770d76fc9508a2266d8bb8c6f2967e0dbc6fc;>e62770d
 channelz: add LocalAddr to listen sockets and test (https://redirect.github.com/grpc/grpc-go/issues/7062;>#7062) (https://redirect.github.com/grpc/grpc-go/issues/7063;>#7063)
   https://github.com/grpc/grpc-go/commit/4ffccf1a5f97417eeb000d521f6415b6eb33bb5f;>4ffccf1
 googlec2p: use xdstp style template for client LDS resource name (https://redirect.github.com/grpc/grpc-go/issues/7048;>#7048)
   

Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks [beam]

2024-04-08 Thread via GitHub


dependabot[bot] closed pull request #30557: Bump 
github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks
URL: https://github.com/apache/beam/pull/30557


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] The PostCommit XVR JavaUsingPython Dataflow job is flaky [beam]

2024-04-08 Thread via GitHub


github-actions[bot] opened a new issue, #30899:
URL: https://github.com/apache/beam/issues/30899

   The PostCommit XVR JavaUsingPython Dataflow is failing over 50% of the time 
   Please visit 
https://github.com/apache/beam/actions/workflows/beam_PostCommit_XVR_JavaUsingPython_Dataflow.yml?query=is%3Afailure+branch%3Amaster
 to see the logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump GCP-BOM to 26.36.0 [beam]

2024-04-08 Thread via GitHub


Abacn merged PR #30868:
URL: https://github.com/apache/beam/pull/30868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.30.0 in /sdks [beam]

2024-04-08 Thread via GitHub


dependabot[bot] opened a new pull request, #30901:
URL: https://github.com/apache/beam/pull/30901

   Bumps 
[github.com/testcontainers/testcontainers-go](https://github.com/testcontainers/testcontainers-go)
 from 0.26.0 to 0.30.0.
   
   Release notes
   Sourced from https://github.com/testcontainers/testcontainers-go/releases;>github.com/testcontainers/testcontainers-go's
 releases.
   
   v0.30.0
   What's Changed
    Features
   
   feat(k6):Add remote test scripts (https://redirect.github.com/testcontainers/testcontainers-go/issues/2350;>#2350)
 https://github.com/bearrito;>@​bearrito
   feat: optimizes file copies to and from containers (https://redirect.github.com/testcontainers/testcontainers-go/issues/2450;>#2450)
 https://github.com/codefromthecrypt;>@​codefromthecrypt
   feat(exitcode): Add exit code sugar method (https://redirect.github.com/testcontainers/testcontainers-go/issues/2342;>#2342)
 https://github.com/bearrito;>@​bearrito
   feat: add module to support InfluxDB v1.x (https://redirect.github.com/testcontainers/testcontainers-go/issues/1703;>#1703)
 https://github.com/JJCinAZ;>@​JJCinAZ
   feat: authenticate docker on PullImage (https://redirect.github.com/testcontainers/testcontainers-go/issues/2446;>#2446)
 https://github.com/codefromthecrypt;>@​codefromthecrypt
   feat: add distribution-registry module (https://redirect.github.com/testcontainers/testcontainers-go/issues/2341;>#2341)
 https://github.com/mdelapenya;>@​mdelapenya
   feat: support passing io.Reader as ContainerFile (https://redirect.github.com/testcontainers/testcontainers-go/issues/2401;>#2401)
 https://github.com/mdelapenya;>@​mdelapenya
   feat(MustConn): Add MustConnectionString on (some) dbs (https://redirect.github.com/testcontainers/testcontainers-go/issues/2343;>#2343)
 https://github.com/bearrito;>@​bearrito
   feat: support for waiting for response headers (https://redirect.github.com/testcontainers/testcontainers-go/issues/2349;>#2349)
 https://github.com/mdelapenya;>@​mdelapenya
   Add method for getting Weaviate's gRPC port (https://redirect.github.com/testcontainers/testcontainers-go/issues/2339;>#2339)
 https://github.com/antas-marcin;>@​antas-marcin
   feat: add openfga module (https://redirect.github.com/testcontainers/testcontainers-go/issues/2332;>#2332)
 https://github.com/mdelapenya;>@​mdelapenya
   
    Bug Fixes
   
   Fix: HTTP wait strategy does not take query params into account (https://redirect.github.com/testcontainers/testcontainers-go/issues/2466;>#2466)
 https://github.com/benja-M-1;>@​benja-M-1
   fix: logging deadlock (https://redirect.github.com/testcontainers/testcontainers-go/issues/2346;>#2346)
 https://github.com/stevenh;>@​stevenh
   fix(exec): updates the Multiplexed opt to combine stdout 
and stderr (https://redirect.github.com/testcontainers/testcontainers-go/issues/2452;>#2452)
 https://github.com/gustavosbarreto;>@​gustavosbarreto
   bug:Fix AMQPS url (https://redirect.github.com/testcontainers/testcontainers-go/issues/2462;>#2462)
 https://github.com/bearrito;>@​bearrito
   Added error handling for context.Canceled in log reading code (https://redirect.github.com/testcontainers/testcontainers-go/issues/2268;>#2268)
 https://github.com/prateekdwivedi;>@​prateekdwivedi
   fix: consul race on HTTP port (https://redirect.github.com/testcontainers/testcontainers-go/issues/2336;>#2336)
 https://github.com/codefromthecrypt;>@​codefromthecrypt
   
    Documentation
   
   docs: fix wrong copypaste in Weaviate docs (https://redirect.github.com/testcontainers/testcontainers-go/issues/2338;>#2338)
 https://github.com/mdelapenya;>@​mdelapenya
   
   粒 Housekeeping
   
   Upgrade neo4j module to use features from v0.29.1 of testcontainers-go 
(https://redirect.github.com/testcontainers/testcontainers-go/issues/2463;>#2463)
 https://github.com/danielorbach;>@​danielorbach
   chore: use docker compose (v2) instead of 
docker-compose (v1) (https://redirect.github.com/testcontainers/testcontainers-go/issues/2464;>#2464)
 https://github.com/mdelapenya;>@​mdelapenya
   refactor: Add Weaviate modules tests (https://redirect.github.com/testcontainers/testcontainers-go/issues/2447;>#2447)
 https://github.com/antas-marcin;>@​antas-marcin
   docs: Fix typo in ci-test-go.yml (https://redirect.github.com/testcontainers/testcontainers-go/issues/2394;>#2394)
 https://github.com/uh-zz;>@​uh-zz
   redpanda: set entrypoint to the custom entrypoint file (https://redirect.github.com/testcontainers/testcontainers-go/issues/2347;>#2347)
 https://github.com/bojand;>@​bojand
   Move the container and config tests into a test package (https://redirect.github.com/testcontainers/testcontainers-go/issues/2242;>#2242)
 https://github.com/Minivera;>@​Minivera
   chore: use WithEnv option in localstack module (https://redirect.github.com/testcontainers/testcontainers-go/issues/2337;>#2337)
 https://github.com/mdelapenya;>@​mdelapenya
   chore: check that the new version is not empty 

Re: [PR] Bump github.com/testcontainers/testcontainers-go from 0.26.0 to 0.29.1 in /sdks [beam]

2024-04-08 Thread via GitHub


dependabot[bot] commented on PR #30557:
URL: https://github.com/apache/beam/pull/30557#issuecomment-2044149633

   Superseded by #30901.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Patch release website changes [beam]

2024-04-08 Thread via GitHub


damccorm commented on PR #30839:
URL: https://github.com/apache/beam/pull/30839#issuecomment-2042721488

   R: @Abacn


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Patch release website changes [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30839:
URL: https://github.com/apache/beam/pull/30839#issuecomment-2042724168

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Patch release website changes [beam]

2024-04-08 Thread via GitHub


damccorm commented on PR #30839:
URL: https://github.com/apache/beam/pull/30839#issuecomment-2042741503

   R: @liferoad since I think Yi is out today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]

2024-04-08 Thread via GitHub


brachipa commented on issue #30498:
URL: https://github.com/apache/beam/issues/30498#issuecomment-2042822014

   I also tried run calcite (same version as beam uses) unit test with the same 
query and it works fine, with my fix and without my fix. I think it sounds like 
an issue in beam
   added in org.apache.calcite.test.JdbcTest
   ```
 @Test void testSimple() {
   final String sql = "select \"name\" as n1, count(*) as c\n"
   + "from \"hr\".\"emps\" group by \"name\"";
   CalciteAssert.that()
   .with(CalciteAssert.Config.REGULAR)
   .query(sql)
   .returns("N1=Theodore; C=1\nN1=Eric; C=1\nN1=Bill; 
C=1\nN1=Sebastian; C=1\n");
 }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556091869


##
CHANGES.md:
##
@@ -80,6 +80,7 @@
 ## Bugfixes
 
 * Fixed locking issue when shutting down inactive bundle processors. Symptoms 
of this issue include slowness or stuckness in long-running jobs (Python) 
([#30679](https://github.com/apache/beam/pull/30679)).
+* Fixed kafka polling issue due to short consumer polling timeout time. 
Changes to make the consumer polling timeout configurable for KafkaIO.Read with 
new command: KafkaIO.read().withConsumerPollingTimeout(Duration duration). 
Default timeout has been increased from 1 second to 2 seconds( 
[#30870](https://github.com/apache/beam/pull/30877)).

Review Comment:
   Can we move this to Breaking Changes? We can reword these like: Default 
consumer polling timeout for `KafkaIO.Read` was increased from 1 second to 2 
seconds. Use `KafkaIO.read().withConsumerPollingTimeout(Duration duration)` to 
configure this timeout value when necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py311 [beam]

2024-04-08 Thread via GitHub


dependabot[bot] opened a new pull request, #30891:
URL: https://github.com/apache/beam/pull/30891

   Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 
to 4.6.3.
   
   Changelog
   Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's
 changelog.
   
   Changelog
   Changes in Version 4.7
   PyMongo 4.7 brings a number of improvements including:
   
   
   Added the :class:pymongo.hello.Hello.connection_id,
   
:attr:pymongo.monitoring.CommandStartedEvent.server_connection_id,
   
:attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id,
 and
   
:attr:pymongo.monitoring.CommandFailedEvent.server_connection_id 
properties.
   
   
   Fixed a bug where inflating a 
:class:~bson.raw_bson.RawBSONDocument containing a 
:class:~bson.code.Code would cause an error.
   
   
   Significantly improved the performance of encoding BSON documents to 
JSON.
   
   
   Support for named KMS providers for client side field level encryption.
   Previously supported KMS providers were only: aws, azure, gcp, kmip, and 
local.
   The KMS provider is now expanded to support name suffixes (e.g. 
local:myname).
   Named KMS providers enables more than one of each KMS provider type to be 
configured.
   See the docstring for 
:class:~pymongo.encryption_options.AutoEncryptionOpts.
   Note that named KMS providers requires pymongocrypt =1.9 and 
libmongocrypt =1.9.
   
   
   :meth:~pymongo.encryption.ClientEncryption.encrypt and
   :meth:~pymongo.encryption.ClientEncryption.encrypt_expression 
now allow key_id
   to be passed in as a :class:uuid.UUID.
   
   
   Fixed a bug where :class:~bson.int64.Int64 instances could 
not always be encoded by orjson_. The following now
   works::
   
   
   
   import orjson
   from bson import json_util
   orjson.dumps({'a': Int64(1)}, default=json_util.default, 
option=orjson.OPT_PASSTHROUGH_SUBCLASS)
   
   
   
   
   
   .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson
   
   Fixed a bug appearing in Python 3.12 where RuntimeError: can't 
create new thread at interpreter shutdown
   could be written to stderr when a MongoClient's thread starts as the python 
interpreter is shutting down.
   Added a warning when connecting to DocumentDB and CosmosDB clusters.
   For more information regarding feature compatibility and support please visit
   mongodb.com/supportability/documentdb 
https://mongodb.com/supportability/documentdb;_ and
   mongodb.com/supportability/cosmosdb 
https://mongodb.com/supportability/cosmosdb;_.
   Added the 
:attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration,
   
:attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, 
and
   :attr:pymongo.monitoring.ConnectionReadyEvent.duration 
properties.
   Added the type and kwargs arguments to 
:class:~pymongo.operations.SearchIndexModel to enable
   creating vector search indexes in MongoDB Atlas.
   Fixed a bug where read_concern and 
write_concern were improperly added to
   :meth:~pymongo.collection.Collection.list_search_indexes 
queries.
   
   Unavoidable breaking changes
   
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f
 BUMP 4.6.3
   https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d
 PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564)
   https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3
 BUMP to 4.6.3.dev0
   See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop 

Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py311 [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30891:
URL: https://github.com/apache/beam/pull/30891#issuecomment-2043302472

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @riteshghorse for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30890:
URL: https://github.com/apache/beam/pull/30890#issuecomment-2043302615

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @damccorm for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30888:
URL: https://github.com/apache/beam/pull/30888#issuecomment-2043302869

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @tvalentyn for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Support FLOAT32 type in Spanner [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30893:
URL: https://github.com/apache/beam/pull/30893#issuecomment-2043375457

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]

2024-04-08 Thread via GitHub


damccorm closed pull request #30890: Bump pymongo from 4.6.2 to 4.6.3 in 
/sdks/python/container/py38
URL: https://github.com/apache/beam/pull/30890


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]

2024-04-08 Thread via GitHub


dependabot[bot] commented on PR #30890:
URL: https://github.com/apache/beam/pull/30890#issuecomment-2043419704

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


jbsabbagh commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556092498


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,8 +525,11 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
+LOG.warn(
+"No messages retrieved with polling timeout {} seconds. Consider 
increasing the consumer polling timeout using withConsumerPollingTimeout 
method.",
+consumerPollingTimeout.getSeconds());

Review Comment:
   Whoops - sorry I missed this. I think this warning should be emitted only 
when `rawRecords` is empty. There are still cases where the timeout is over and 
`rawRecords` is not empty
   ```suggestion
  if rawRecords.isEmpty() {
   LOG.warn(
   "No messages retrieved with polling timeout {} seconds. Consider 
increasing the consumer polling timeout using withConsumerPollingTimeout 
method.",
   consumerPollingTimeout.getSeconds());
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


jbsabbagh commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556092498


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,8 +525,11 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
+LOG.warn(
+"No messages retrieved with polling timeout {} seconds. Consider 
increasing the consumer polling timeout using withConsumerPollingTimeout 
method.",
+consumerPollingTimeout.getSeconds());

Review Comment:
   Whoops - sorry I missed this. I think this warning should be emitted only 
when rawRecords is empty. There are still cases where the timeout is over and 
`rawRecords` is not empty
   ```suggestion
  if rawRecords.isEmpty() {
   LOG.warn(
   "No messages retrieved with polling timeout {} seconds. Consider 
increasing the consumer polling timeout using withConsumerPollingTimeout 
method.",
   consumerPollingTimeout.getSeconds());
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Support FLOAT32 type in Spanner [beam]

2024-04-08 Thread via GitHub


arawind opened a new pull request, #30893:
URL: https://github.com/apache/beam/pull/30893

   Adds support for the newly added FLOAT32 type in Google Cloud Spanner.
   
   Since FLOAT was already mapped to FLOAT64, this change modifies the existing 
mapping to FLOAT -> FLOAT32 instead.
   
   Addresses #30700.
   
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]

2024-04-08 Thread via GitHub


kennknowles commented on issue #30498:
URL: https://github.com/apache/beam/issues/30498#issuecomment-2043220798

   Wow nice detective work. If we still see it in Beam but not Calcite then it 
must be one of our optimization rules. I would look at the debug trace for when 
it goes wrong. Or with a short test like that, maybe just brute force removing 
one rule at a time. Or remove all the optimization rules and add them back one 
at a time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py310 [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30889:
URL: https://github.com/apache/beam/pull/30889#issuecomment-2043302750

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @shunping for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add PubSubIO Stress test [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30886:
URL: https://github.com/apache/beam/pull/30886#issuecomment-2043317594

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Initial Iceberg Sink [beam]

2024-04-08 Thread via GitHub


kennknowles commented on code in PR #30797:
URL: https://github.com/apache/beam/pull/30797#discussion_r1552651757


##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/IcebergIO.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.io.iceberg;
+
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+public class IcebergIO {
+
+  public static WriteRows writeToDynamicDestinations(

Review Comment:
   We could do that. I was thinking that we might make convenience methods 
later like `writeToTable(catalog, table_id)` so I made this name extra long. We 
can always add others. I tend to prefer different method names rather than 
overloading.



##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/AppendFilesToTables.java:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.io.iceberg;
+
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.SerializableCoder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.Snapshot;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.checkerframework.checker.nullness.qual.MonotonicNonNull;
+
+class AppendFilesToTables
+extends PTransform, PCollection>> {
+
+  private final IcebergCatalogConfig catalogConfig;
+
+  AppendFilesToTables(IcebergCatalogConfig catalogConfig) {
+this.catalogConfig = catalogConfig;
+  }
+
+  @Override
+  public PCollection> expand(PCollection 
writtenFiles) {
+
+// Apply any sharded writes and flatten everything for catalog updates
+return writtenFiles
+.apply(
+"Key metadata updates by table",
+WithKeys.of(
+new SerializableFunction() {
+  @Override
+  public String apply(FileWriteResult input) {
+return input.getTableIdentifier().toString();
+  }
+}))
+// .setCoder(KvCoder.of(StringUtf8Coder.of(), new 
MetadataUpdate.MetadataUpdateCoder()))

Review Comment:
   Done



##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/FileWriteResult.java:
##
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to 

Re: [PR] Optimize histograms and metric names in BigQuerySinkMetrics [beam]

2024-04-08 Thread via GitHub


codecov-commenter commented on PR #30796:
URL: https://github.com/apache/beam/pull/30796#issuecomment-2043422958

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   Attention: Patch coverage is `87.5%` with `2 lines` in your changes are 
missing coverage. Please review.
   > Project coverage is 70.86%. Comparing base 
[(`e894d8c`)](https://app.codecov.io/gh/apache/beam/commit/e894d8ca7fe10ed4771ce9d999b48361c25220ec?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`ebf509e`)](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 104 commits behind head on master.
   
   > :exclamation: Current head ebf509e differs from pull request most recent 
head e3f3012. Consider uploading reports for the commit e3f3012 to get more 
accurate results
   
   | 
[Files](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[.../beam/sdk/io/gcp/bigquery/BigQuerySinkMetrics.java](https://app.codecov.io/gh/apache/beam/pull/30796?src=pr=tree=sdks%2Fjava%2Fio%2Fgoogle-cloud-platform%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fbeam%2Fsdk%2Fio%2Fgcp%2Fbigquery%2FBigQuerySinkMetrics.java_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-c2Rrcy9qYXZhL2lvL2dvb2dsZS1jbG91ZC1wbGF0Zm9ybS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvYmVhbS9zZGsvaW8vZ2NwL2JpZ3F1ZXJ5L0JpZ1F1ZXJ5U2lua01ldHJpY3MuamF2YQ==)
 | 86.66% | [1 Missing and 1 partial :warning: 
](https://app.codecov.io/gh/apache/beam/pull/30796?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #30796  +/-   ##
   
   - Coverage 71.47%   70.86%   -0.61% 
   - Complexity0 2990+2990 
   
 Files   710 1062 +352 
 Lines104815   132697   +27882 
 Branches  0 3230+3230 
   
   + Hits  7491594034   +19119 
   - Misses2826835570+7302 
   - Partials   1632 3093+1461 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30796/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[java](https://app.codecov.io/gh/apache/beam/pull/30796/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `68.57% <87.50%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30796?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


jbsabbagh commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556092498


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,8 +525,11 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
+LOG.warn(
+"No messages retrieved with polling timeout {} seconds. Consider 
increasing the consumer polling timeout using withConsumerPollingTimeout 
method.",
+consumerPollingTimeout.getSeconds());

Review Comment:
   Whoops - sorry I missed this. I think this warning should be emitted only 
when `rawRecords` is empty. There are still cases where the timeout is over and 
`rawRecords` is not empty
   ```suggestion
  if (rawRecords.isEmpty()) {
   LOG.warn(
   "No messages retrieved with polling timeout {} seconds. Consider 
increasing the consumer polling timeout using withConsumerPollingTimeout 
method.",
   consumerPollingTimeout.getSeconds());
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]

2024-04-08 Thread via GitHub


dependabot[bot] opened a new pull request, #30888:
URL: https://github.com/apache/beam/pull/30888

   Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 
to 4.6.3.
   
   Changelog
   Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's
 changelog.
   
   Changelog
   Changes in Version 4.7
   PyMongo 4.7 brings a number of improvements including:
   
   
   Added the :class:pymongo.hello.Hello.connection_id,
   
:attr:pymongo.monitoring.CommandStartedEvent.server_connection_id,
   
:attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id,
 and
   
:attr:pymongo.monitoring.CommandFailedEvent.server_connection_id 
properties.
   
   
   Fixed a bug where inflating a 
:class:~bson.raw_bson.RawBSONDocument containing a 
:class:~bson.code.Code would cause an error.
   
   
   Significantly improved the performance of encoding BSON documents to 
JSON.
   
   
   Support for named KMS providers for client side field level encryption.
   Previously supported KMS providers were only: aws, azure, gcp, kmip, and 
local.
   The KMS provider is now expanded to support name suffixes (e.g. 
local:myname).
   Named KMS providers enables more than one of each KMS provider type to be 
configured.
   See the docstring for 
:class:~pymongo.encryption_options.AutoEncryptionOpts.
   Note that named KMS providers requires pymongocrypt =1.9 and 
libmongocrypt =1.9.
   
   
   :meth:~pymongo.encryption.ClientEncryption.encrypt and
   :meth:~pymongo.encryption.ClientEncryption.encrypt_expression 
now allow key_id
   to be passed in as a :class:uuid.UUID.
   
   
   Fixed a bug where :class:~bson.int64.Int64 instances could 
not always be encoded by orjson_. The following now
   works::
   
   
   
   import orjson
   from bson import json_util
   orjson.dumps({'a': Int64(1)}, default=json_util.default, 
option=orjson.OPT_PASSTHROUGH_SUBCLASS)
   
   
   
   
   
   .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson
   
   Fixed a bug appearing in Python 3.12 where RuntimeError: can't 
create new thread at interpreter shutdown
   could be written to stderr when a MongoClient's thread starts as the python 
interpreter is shutting down.
   Added a warning when connecting to DocumentDB and CosmosDB clusters.
   For more information regarding feature compatibility and support please visit
   mongodb.com/supportability/documentdb 
https://mongodb.com/supportability/documentdb;_ and
   mongodb.com/supportability/cosmosdb 
https://mongodb.com/supportability/cosmosdb;_.
   Added the 
:attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration,
   
:attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, 
and
   :attr:pymongo.monitoring.ConnectionReadyEvent.duration 
properties.
   Added the type and kwargs arguments to 
:class:~pymongo.operations.SearchIndexModel to enable
   creating vector search indexes in MongoDB Atlas.
   Fixed a bug where read_concern and 
write_concern were improperly added to
   :meth:~pymongo.collection.Collection.list_search_indexes 
queries.
   
   Unavoidable breaking changes
   
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f
 BUMP 4.6.3
   https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d
 PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564)
   https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3
 BUMP to 4.6.3.dev0
   See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop 

[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py310 [beam]

2024-04-08 Thread via GitHub


dependabot[bot] opened a new pull request, #30889:
URL: https://github.com/apache/beam/pull/30889

   Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 
to 4.6.3.
   
   Changelog
   Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's
 changelog.
   
   Changelog
   Changes in Version 4.7
   PyMongo 4.7 brings a number of improvements including:
   
   
   Added the :class:pymongo.hello.Hello.connection_id,
   
:attr:pymongo.monitoring.CommandStartedEvent.server_connection_id,
   
:attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id,
 and
   
:attr:pymongo.monitoring.CommandFailedEvent.server_connection_id 
properties.
   
   
   Fixed a bug where inflating a 
:class:~bson.raw_bson.RawBSONDocument containing a 
:class:~bson.code.Code would cause an error.
   
   
   Significantly improved the performance of encoding BSON documents to 
JSON.
   
   
   Support for named KMS providers for client side field level encryption.
   Previously supported KMS providers were only: aws, azure, gcp, kmip, and 
local.
   The KMS provider is now expanded to support name suffixes (e.g. 
local:myname).
   Named KMS providers enables more than one of each KMS provider type to be 
configured.
   See the docstring for 
:class:~pymongo.encryption_options.AutoEncryptionOpts.
   Note that named KMS providers requires pymongocrypt =1.9 and 
libmongocrypt =1.9.
   
   
   :meth:~pymongo.encryption.ClientEncryption.encrypt and
   :meth:~pymongo.encryption.ClientEncryption.encrypt_expression 
now allow key_id
   to be passed in as a :class:uuid.UUID.
   
   
   Fixed a bug where :class:~bson.int64.Int64 instances could 
not always be encoded by orjson_. The following now
   works::
   
   
   
   import orjson
   from bson import json_util
   orjson.dumps({'a': Int64(1)}, default=json_util.default, 
option=orjson.OPT_PASSTHROUGH_SUBCLASS)
   
   
   
   
   
   .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson
   
   Fixed a bug appearing in Python 3.12 where RuntimeError: can't 
create new thread at interpreter shutdown
   could be written to stderr when a MongoClient's thread starts as the python 
interpreter is shutting down.
   Added a warning when connecting to DocumentDB and CosmosDB clusters.
   For more information regarding feature compatibility and support please visit
   mongodb.com/supportability/documentdb 
https://mongodb.com/supportability/documentdb;_ and
   mongodb.com/supportability/cosmosdb 
https://mongodb.com/supportability/cosmosdb;_.
   Added the 
:attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration,
   
:attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, 
and
   :attr:pymongo.monitoring.ConnectionReadyEvent.duration 
properties.
   Added the type and kwargs arguments to 
:class:~pymongo.operations.SearchIndexModel to enable
   creating vector search indexes in MongoDB Atlas.
   Fixed a bug where read_concern and 
write_concern were improperly added to
   :meth:~pymongo.collection.Collection.list_search_indexes 
queries.
   
   Unavoidable breaking changes
   
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f
 BUMP 4.6.3
   https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d
 PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564)
   https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3
 BUMP to 4.6.3.dev0
   See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop 

[PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py38 [beam]

2024-04-08 Thread via GitHub


dependabot[bot] opened a new pull request, #30890:
URL: https://github.com/apache/beam/pull/30890

   Bumps [pymongo](https://github.com/mongodb/mongo-python-driver) from 4.6.2 
to 4.6.3.
   
   Changelog
   Sourced from https://github.com/mongodb/mongo-python-driver/blob/master/doc/changelog.rst;>pymongo's
 changelog.
   
   Changelog
   Changes in Version 4.7
   PyMongo 4.7 brings a number of improvements including:
   
   
   Added the :class:pymongo.hello.Hello.connection_id,
   
:attr:pymongo.monitoring.CommandStartedEvent.server_connection_id,
   
:attr:pymongo.monitoring.CommandSucceededEvent.server_connection_id,
 and
   
:attr:pymongo.monitoring.CommandFailedEvent.server_connection_id 
properties.
   
   
   Fixed a bug where inflating a 
:class:~bson.raw_bson.RawBSONDocument containing a 
:class:~bson.code.Code would cause an error.
   
   
   Significantly improved the performance of encoding BSON documents to 
JSON.
   
   
   Support for named KMS providers for client side field level encryption.
   Previously supported KMS providers were only: aws, azure, gcp, kmip, and 
local.
   The KMS provider is now expanded to support name suffixes (e.g. 
local:myname).
   Named KMS providers enables more than one of each KMS provider type to be 
configured.
   See the docstring for 
:class:~pymongo.encryption_options.AutoEncryptionOpts.
   Note that named KMS providers requires pymongocrypt =1.9 and 
libmongocrypt =1.9.
   
   
   :meth:~pymongo.encryption.ClientEncryption.encrypt and
   :meth:~pymongo.encryption.ClientEncryption.encrypt_expression 
now allow key_id
   to be passed in as a :class:uuid.UUID.
   
   
   Fixed a bug where :class:~bson.int64.Int64 instances could 
not always be encoded by orjson_. The following now
   works::
   
   
   
   import orjson
   from bson import json_util
   orjson.dumps({'a': Int64(1)}, default=json_util.default, 
option=orjson.OPT_PASSTHROUGH_SUBCLASS)
   
   
   
   
   
   .. _orjson: https://github.com/ijl/orjson;>https://github.com/ijl/orjson
   
   Fixed a bug appearing in Python 3.12 where RuntimeError: can't 
create new thread at interpreter shutdown
   could be written to stderr when a MongoClient's thread starts as the python 
interpreter is shutting down.
   Added a warning when connecting to DocumentDB and CosmosDB clusters.
   For more information regarding feature compatibility and support please visit
   mongodb.com/supportability/documentdb 
https://mongodb.com/supportability/documentdb;_ and
   mongodb.com/supportability/cosmosdb 
https://mongodb.com/supportability/cosmosdb;_.
   Added the 
:attr:pymongo.monitoring.ConnectionCheckedOutEvent.duration,
   
:attr:pymongo.monitoring.ConnectionCheckOutFailedEvent.duration, 
and
   :attr:pymongo.monitoring.ConnectionReadyEvent.duration 
properties.
   Added the type and kwargs arguments to 
:class:~pymongo.operations.SearchIndexModel to enable
   creating vector search indexes in MongoDB Atlas.
   Fixed a bug where read_concern and 
write_concern were improperly added to
   :meth:~pymongo.collection.Collection.list_search_indexes 
queries.
   
   Unavoidable breaking changes
   
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/mongodb/mongo-python-driver/commit/8da192f9ca2d4f6464897b22b3029c227043f0cb;>8da192f
 BUMP 4.6.3
   https://github.com/mongodb/mongo-python-driver/commit/56b6b6dbc267d365d97c037082369dabf37405d2;>56b6b6d
 PYTHON-4305 Fix bson size check (https://redirect.github.com/mongodb/mongo-python-driver/issues/1564;>#1564)
   https://github.com/mongodb/mongo-python-driver/commit/449d0f316cbcdea59d8b69b5e4fc34ac07949dc6;>449d0f3
 BUMP to 4.6.3.dev0
   See full diff in https://github.com/mongodb/mongo-python-driver/compare/4.6.2...4.6.3;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pymongo=pip=4.6.2=4.6.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop 

Re: [PR] Create YAML Join Transform [beam]

2024-04-08 Thread via GitHub


Polber commented on code in PR #30734:
URL: https://github.com/apache/beam/pull/30734#discussion_r1556253586


##
sdks/python/apache_beam/yaml/yaml_join_test.py:
##
@@ -0,0 +1,210 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import logging
+import unittest
+
+import apache_beam as beam
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+from apache_beam.yaml.yaml_transform import YamlTransform
+
+
+class ToRow(beam.PTransform):
+  def expand(self, pcoll):
+return pcoll | beam.Map(lambda row: beam.Row(**row._asdict()))
+
+
+FRUITS = [
+beam.Row(id=1, name='raspberry'),
+beam.Row(id=2, name='blackberry'),
+]
+
+QUANTITIES = [
+beam.Row(name='raspberry', quantity=1),
+beam.Row(name='blackberry', quantity=2),
+beam.Row(name='blueberry', quantity=3),
+]
+
+CATEGORIES = [
+beam.Row(name='raspberry', category='juicy'),
+beam.Row(name='blackberry', category='dry'),
+beam.Row(name='blueberry', category='dry'),
+beam.Row(name='blueberry', category='juicy'),
+]
+
+
+class YamlJoinTest(unittest.TestCase):

Review Comment:
   Try adding the following annotation to skip this suite in pre-commits. It 
looks like the pre-commits do not have snapshot jars for the gradle providers.
   
   ```suggestion
   @unittest.skipIf(
   TestPipeline().get_pipeline_options().view_as(StandardOptions).runner is
   None,
   'Do not run this test on precommit suites.')
   class YamlJoinTest(unittest.TestCase):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]

2024-04-08 Thread via GitHub


dariabezkorovaina commented on PR #30873:
URL: https://github.com/apache/beam/pull/30873#issuecomment-2043233424

   Hi @damccorm, this is the final PR in these Duet AI prompts series: we've 
created the "no links" versions of all the "documentation lookup" prompts, as 
well as added some nits and polishing edits. Please take a look :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30873:
URL: https://github.com/apache/beam/pull/30873#issuecomment-2043255437

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @jrmccluskey added as fallback since no labels match configuration
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30877:
URL: https://github.com/apache/beam/pull/30877#issuecomment-2043255243

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @bvolpato for label java.
   R: @bvolpato for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add PubSubIO Stress test [beam]

2024-04-08 Thread via GitHub


akashorabek commented on PR #30886:
URL: https://github.com/apache/beam/pull/30886#issuecomment-2043315705

   R: @Abacn


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]

2024-04-08 Thread via GitHub


tvalentyn closed pull request #30888: Bump pymongo from 4.6.2 to 4.6.3 in 
/sdks/python/container/py39
URL: https://github.com/apache/beam/pull/30888


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump pymongo from 4.6.2 to 4.6.3 in /sdks/python/container/py39 [beam]

2024-04-08 Thread via GitHub


dependabot[bot] commented on PR #30888:
URL: https://github.com/apache/beam/pull/30888#issuecomment-2043395144

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add terraform for utility cluster. Add name override to gke [beam]

2024-04-08 Thread via GitHub


volatilemolotov commented on code in PR #30847:
URL: https://github.com/apache/beam/pull/30847#discussion_r1555438203


##
.test-infra/terraform/google-cloud-platform/google-kubernetes-engine/cluster.tf:
##
@@ -34,7 +34,9 @@ resource "google_container_cluster" "default" {
 enable_private_nodes= true
 enable_private_endpoint = false
   }
-  node_config {
-service_account = data.google_service_account.default.email
+  cluster_autoscaling {

Review Comment:
   Forgive me not being clear but code from my comment is fine now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add terraform for utility cluster. Add name override to gke [beam]

2024-04-08 Thread via GitHub


volatilemolotov commented on code in PR #30847:
URL: https://github.com/apache/beam/pull/30847#discussion_r1555483141


##
.test-infra/terraform/google-cloud-platform/google-kubernetes-engine/cluster.tf:
##
@@ -34,7 +34,9 @@ resource "google_container_cluster" "default" {
 enable_private_nodes= true
 enable_private_endpoint = false
   }
-  node_config {
-service_account = data.google_service_account.default.email
+  cluster_autoscaling {

Review Comment:
   @damondouglas Actually putting it into auto_provisioning_defaults is the way 
to go. I guess I might have been referencing older documentation. 
   
   Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Flink] Speed up file write in batch mode by using larger bundle size [beam]

2024-04-08 Thread via GitHub


jto commented on PR #30802:
URL: https://github.com/apache/beam/pull/30802#issuecomment-2042134252

   Sure. I tested it on a job that consumes ~1B records (~150GB). 
   With the Dataset API, runtime is 37min.
   Passing `--useDataStreamForBatch`, I killed it after 1h+ as it was clearly 
too slow.
   
   I tried different bundle size settings with this path and got the following 
execution times:
   
   | Bundle size | execution time (min) |
   | --- | --: |
   | 50,000  |  44 |
   | 100,000 |  37 |
   | 1,000,000   |  29 |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


xianhualiu commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556357679


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,8 +525,11 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
+LOG.warn(
+"No messages retrieved with polling timeout {} seconds. Consider 
increasing the consumer polling timeout using withConsumerPollingTimeout 
method.",
+consumerPollingTimeout.getSeconds());

Review Comment:
   hmm, if rawRecords is not empty, it should already returned before at:
   
   if (!rawRecords.isEmpty()) {
   // return as we have found some entries
   return rawRecords;
 }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: Python Coverage is failing because of new version of typing-extension release [beam]

2024-04-08 Thread via GitHub


riteshghorse closed issue #30806: [Failing Test]: Python Coverage is failing 
because of new version of typing-extension release
URL: https://github.com/apache/beam/issues/30806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Revert "Revert #30533: Automatically execute unbounded pipelines in streaming mode." [beam]

2024-04-08 Thread via GitHub


damccorm opened a new pull request, #30894:
URL: https://github.com/apache/beam/pull/30894

   Reverts apache/beam#30706
   
   This change was initially correct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Windowing Support for the DaskRunner [beam]

2024-04-08 Thread via GitHub


cisaacstern commented on PR #27618:
URL: https://github.com/apache/beam/pull/27618#issuecomment-2043644883

   The necessary upstream fix in Dask was merged!  
   
   Once we get a new Dask release that includes this change, I will finish this 
PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Initial Iceberg Sink [beam]

2024-04-08 Thread via GitHub


chamikaramj commented on code in PR #30797:
URL: https://github.com/apache/beam/pull/30797#discussion_r1556356489


##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/AppendFilesToTables.java:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.io.iceberg;
+
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.SerializableCoder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.WithKeys;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.Snapshot;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.checkerframework.checker.nullness.qual.MonotonicNonNull;
+
+class AppendFilesToTables
+extends PTransform, PCollection>> {
+
+  private final IcebergCatalogConfig catalogConfig;
+
+  AppendFilesToTables(IcebergCatalogConfig catalogConfig) {
+this.catalogConfig = catalogConfig;
+  }
+
+  @Override
+  public PCollection> expand(PCollection 
writtenFiles) {
+
+// Apply any sharded writes and flatten everything for catalog updates
+return writtenFiles
+.apply(
+"Key metadata updates by table",
+WithKeys.of(
+new SerializableFunction() {
+  @Override
+  public String apply(FileWriteResult input) {
+return input.getTableIdentifier().toString();
+  }
+}))
+// .setCoder(KvCoder.of(StringUtf8Coder.of(), new 
MetadataUpdate.MetadataUpdateCoder()))
+.apply("Group metadata updates by table", GroupByKey.create())
+.apply(
+"Append metadata updates to tables",
+ParDo.of(new AppendFilesToTablesDoFn(catalogConfig)))
+.setCoder(KvCoder.of(StringUtf8Coder.of(), 
SerializableCoder.of(Snapshot.class)));
+  }
+
+  private static class AppendFilesToTablesDoFn
+  extends DoFn>, KV> {
+
+private final IcebergCatalogConfig catalogConfig;
+
+private transient @MonotonicNonNull Catalog catalog;
+
+private AppendFilesToTablesDoFn(IcebergCatalogConfig catalogConfig) {
+  this.catalogConfig = catalogConfig;
+}
+
+private Catalog getCatalog() {
+  if (catalog == null) {
+catalog = catalogConfig.catalog();
+  }
+  return catalog;
+}
+
+@ProcessElement
+public void processElement(
+@Element KV> element,
+OutputReceiver> out,
+BoundedWindow window) {
+  Table table = 
getCatalog().loadTable(TableIdentifier.parse(element.getKey()));
+  AppendFiles update = table.newAppend();
+  for (FileWriteResult writtenFile : element.getValue()) {
+update.appendFile(writtenFile.getDataFile());
+  }
+  update.commit();

Review Comment:
   Yeah, (2) is fine. It's more about making sure that we don't double write if 
a work item fails. But if writing is idempotent it's simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Failing Test]: Python Coverage is failing because of new version of typing-extension release [beam]

2024-04-08 Thread via GitHub


riteshghorse commented on issue #30806:
URL: https://github.com/apache/beam/issues/30806#issuecomment-2043532181

   https://github.com/apache/beam/pull/30863 fixed it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Correct per-entry HashMap overhead in WindmillStateCache [beam]

2024-04-08 Thread via GitHub


dmitryor commented on PR #30672:
URL: https://github.com/apache/beam/pull/30672#issuecomment-2043539547

   I rebased to recent `main` and tests are passing now. I suspect the previous 
failure was a flake.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556372231


##
CHANGES.md:
##
@@ -72,6 +72,7 @@
 ## Breaking Changes
 
 * X behavior was changed ([#X](https://github.com/apache/beam/issues/X)).
+* Default consumer polling timeout for KafkaIO.Read was increased from 1 
second to 2 seconds. Use KafkaIO.read().withConsumerPollingTimeout(Duration 
duration) to configure this timeout value when necessary 
([#30870](https://github.com/apache/beam/pull/30877)).

Review Comment:
   Can you link this to the issue? not mixing issue and PR here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]

2024-04-08 Thread via GitHub


damccorm merged PR #30873:
URL: https://github.com/apache/beam/pull/30873


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]

2024-04-08 Thread via GitHub


brachipa commented on issue #30498:
URL: https://github.com/apache/beam/issues/30498#issuecomment-2042540303

   Ok, I think I find what cause it. calcite checks if expression node is equal 
to row fields
   
https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2068C5-L2069C63
   using RexUtil.isIdentity method: 
   ```
public static boolean isIdentity(List exps,
 RelDataType inputRowType) {
   return inputRowType.getFieldCount() == exps.size()
   && containIdentity(exps, inputRowType, Litmus.IGNORE);
 }
   ```
   
   In the failing example, we have row with wider schema from what we actually 
select. most of our schemas has much more data from what we select, identifying 
the row as not identical row causes it to create a project with fields as they 
appear in the select , meaning with their alias and not with their origin field 
name.
   
   And then it is ignored in the "rename" method later on
   
https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2130
   
   and alias is skipped
   
   
https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2142
   
   I believe the `isIdentity` check can cause more issues, and we must 
understand why this is enforced? isn't it valid to have different size of 
fields in select from what we have in the schema?
   
   In our case we have one big row and we run on it different queries, each 
with different fields in the select.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Clean doc related to write data in bigquery.py [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30887:
URL: https://github.com/apache/beam/pull/30887#issuecomment-2042578339

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @liferoad for label python.
   R: @chamikaramj for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement Web API connector interfaces [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30815:
URL: https://github.com/apache/beam/pull/30815#issuecomment-2042589105

   Reminder, please take a look at this pr: @lostluck @Abacn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Implement Web API connector interfaces [beam]

2024-04-08 Thread via GitHub


codecov-commenter commented on PR #30815:
URL: https://github.com/apache/beam/pull/30815#issuecomment-2042591781

   ## 
[Codecov](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 Report
   Attention: Patch coverage is `60.97561%` with `32 lines` in your changes are 
missing coverage. Please review.
   > Project coverage is 70.72%. Comparing base 
[(`fc5df6f`)](https://app.codecov.io/gh/apache/beam/commit/fc5df6f261bbe6ea910ffc1de8e6c093c9751c60?dropdown=coverage=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 to head 
[(`b8e8747`)](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   > Report is 88 commits behind head on master.
   
   | 
[Files](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[sdks/go/pkg/beam/io/webapi/webapi.go](https://app.codecov.io/gh/apache/beam/pull/30815?src=pr=tree=sdks%2Fgo%2Fpkg%2Fbeam%2Fio%2Fwebapi%2Fwebapi.go_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#diff-c2Rrcy9nby9wa2cvYmVhbS9pby93ZWJhcGkvd2ViYXBpLmdv)
 | 60.97% | [25 Missing and 7 partials :warning: 
](https://app.codecov.io/gh/apache/beam/pull/30815?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #30815  +/-   ##
   
   - Coverage 70.73%   70.72%   -0.02% 
 Complexity 4468 4468  
   
 Files  1256 1257   +1 
 Lines140774   140856  +82 
 Branches   4306 4306  
   
   + Hits  9958199622  +41 
   - Misses3771437745  +31 
   - Partials   3479 3489  +10 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/beam/pull/30815/flags?src=pr=flags_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[go](https://app.codecov.io/gh/apache/beam/pull/30815/flags?src=pr=flag_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache)
 | `54.34% <60.97%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/beam/pull/30815?dropdown=coverage=pr=continue_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Typescript] SDK fixes [beam]

2024-04-08 Thread via GitHub


github-actions[bot] closed pull request #29889: [Typescript] SDK fixes
URL: https://github.com/apache/beam/pull/29889


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Typescript] SDK fixes [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #29889:
URL: https://github.com/apache/beam/pull/29889#issuecomment-2042643912

   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump github.com/aws/smithy-go from 1.20.1 to 1.20.2 in /sdks [beam]

2024-04-08 Thread via GitHub


jrmccluskey merged PR #30883:
URL: https://github.com/apache/beam/pull/30883


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Initial Iceberg Sink [beam]

2024-04-08 Thread via GitHub


chamikaramj commented on code in PR #30797:
URL: https://github.com/apache/beam/pull/30797#discussion_r1556446483


##
sdks/java/io/iceberg/src/main/java/org/apache/beam/io/iceberg/RecordWriter.java:
##
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.io.iceberg;
+
+import static org.apache.beam.io.iceberg.RowHelper.rowToRecord;
+
+import java.io.IOException;
+import org.apache.beam.sdk.values.Row;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.data.Record;
+import org.apache.iceberg.data.parquet.GenericParquetWriter;
+import org.apache.iceberg.io.DataWriter;
+import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.orc.ORC;
+import org.apache.iceberg.parquet.Parquet;
+
+class RecordWriter {
+
+  private final DataWriter icebergDataWriter;
+
+  private final Table table;
+
+  RecordWriter(Catalog catalog, IcebergDestination destination, String 
filename)
+  throws IOException {
+this(
+catalog.loadTable(destination.getTableIdentifier()), 
destination.getFileFormat(), filename);
+  }
+
+  RecordWriter(Table table, FileFormat fileFormat, String filename) throws 
IOException {
+this.table = table;
+
+String absoluteFilename = table.location() + "/" + filename;
+OutputFile outputFile = table.io().newOutputFile(absoluteFilename);
+switch (fileFormat) {
+  case AVRO:
+icebergDataWriter =
+Avro.writeData(outputFile)
+
.createWriterFunc(org.apache.iceberg.data.avro.DataWriter::create)
+.schema(table.schema())
+.withSpec(table.spec())
+.overwrite()
+.build();
+break;
+  case PARQUET:
+icebergDataWriter =
+Parquet.writeData(outputFile)
+.createWriterFunc(GenericParquetWriter::buildWriter)
+.schema(table.schema())
+.withSpec(table.spec())
+.overwrite()
+.build();
+break;
+  case ORC:
+throw new UnsupportedOperationException("ORC file format not currently 
supported.");
+//icebergDataWriter =

Review Comment:
   Delete ?



##
settings.gradle.kts:
##
@@ -355,3 +355,7 @@ include("sdks:java:io:kafka:kafka-01103")
 findProject(":sdks:java:io:kafka:kafka-01103")?.name = "kafka-01103"
 include("sdks:java:managed")
 findProject(":sdks:java:managed")?.name = "managed"
+include("sdks:java:io:iceberg")
+findProject(":sdks:java:io:iceberg")?.name = "iceberg"
+include("sdks:java:io:catalog")

Review Comment:
   It doesn't look like we add anything under "sdks:java:io:catalog".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Adding support for high priority queries to xlang transforms writing … [beam]

2024-04-08 Thread via GitHub


pabloem commented on PR #30869:
URL: https://github.com/apache/beam/pull/30869#issuecomment-2043691501

   is this good to go?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [yaml] backtick generated aliases on sql mappings [beam]

2024-04-08 Thread via GitHub


Polber opened a new pull request, #30895:
URL: https://github.com/apache/beam/pull/30895

   There are cases where `MapToFields` does not work for sql expressions where 
a field in the incoming schema is a reserved keyword.
   
   For example, with an input schema of 
   `{foo: int, timestamp: str}`
   
   The following would fail:
   ```
   - type: MapToFields
 config:
   language: sql
   append: true
   fields:
 newField: foo
   ```
   as the generated query is 
   `SELECT (timestamp) AS timestamp, (foo) AS foo, (foo) AS newField FROM 
PCOLLECTION.`
   
   where `(timestamp) AS timestamp` was generated due to `append: true` being 
set.
   
   ---
   
   This PR provides a fix by auto-escaping aliases that are generated by the 
framework such as the example above, or when naming a field, i.e.
   ```
   - type: MapToFields
 name: Transform
 config:
   language: sql
   fields:
 timestamp: foo
   ```
   
   
   Note: The expression itself will not be auto-backticked, so the following is 
still expected to fail:
   ```
   - type: MapToFields
 config:
   language: sql
   fields:
 foo: timestamp
   ```
   a user would be expected to write valid SQL, i.e.
   ```
   - type: MapToFields
 config:
   language: sql
   fields:
 foo: "`timestamp`"
   ```
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30895:
URL: https://github.com/apache/beam/pull/30895#issuecomment-2043738129

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] add use-case example [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30896:
URL: https://github.com/apache/beam/pull/30896#discussion_r1556503761


##
sdks/python/apache_beam/yaml/examples/simple_filter_and_combine.yaml:
##
@@ -0,0 +1,56 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the#  Row(word='License'); you may not use this file except in compliance 
with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an#  Row(word='AS IS' BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# The following example reads mock transaction data from 
resources/products.csv,
+# performs a simple filter for "Electronics", then calculates the revenue and
+# number of products sold for each product type.
+pipeline:
+  transforms:
+- type: ReadFromCsv
+  name: ReadInputFile
+  config:
+path: resources/products.csv
+- type: Filter
+  name: FilterWithCategory
+  input: ReadInputFile
+  config:
+language: python
+keep: category == "Electronics"
+- type: Combine
+  name: CountNumberSold
+  input: FilterWithCategory
+  config:
+group_by: product_name
+combine:
+  num_sold:
+value: product_name
+fn: count
+  total_revenue:
+value: price
+fn: sum
+- type: WriteToCsv
+  name: WriteOutputFile
+  input: CountNumberSold
+  config:
+path: output
+
+options:
+  yaml_experimental_features: Combine
+
+# Expected:

Review Comment:
   
[README.md](https://github.com/apache/beam/blob/9add040efc69c516b5462668ac923979cc38a6d0/sdks/python/apache_beam/yaml/examples/README.md)
 can we update this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] add use-case example [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30896:
URL: https://github.com/apache/beam/pull/30896#discussion_r1556504460


##
sdks/python/apache_beam/yaml/examples/resources/products.csv:
##
@@ -0,0 +1,6 @@
+transaction_id,product_name,category,price
+T0012,Headphones,Electronics,59.99
+T5034,Leather Jacket,Apparel,109.99
+T0024,Aluminum Mug,Kitchen,29.99
+T0104,Headphones,Electronics,59.99
+T0302,Monitor,Electronics,249.99

Review Comment:
   do we need to this csv file? can we just create a temp csv for your example?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30895:
URL: https://github.com/apache/beam/pull/30895#discussion_r1556512365


##
sdks/python/apache_beam/yaml/yaml_mapping.py:
##
@@ -515,7 +515,7 @@ def normalize_fields(pcoll, fields, drop=(), append=False, 
language='generic'):
 
   if append:
 return input_schema, {
-**{name: name
+**{name: f'`{name}`' if language == 'sql' else name

Review Comment:
   looks like these issues should be easily caught with some tests. So can we 
add more tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]

2024-04-08 Thread via GitHub


Polber commented on PR #30895:
URL: https://github.com/apache/beam/pull/30895#issuecomment-2043737188

   R: @damccorm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] backtick generated aliases on sql mappings [beam]

2024-04-08 Thread via GitHub


Polber commented on PR #30895:
URL: https://github.com/apache/beam/pull/30895#issuecomment-2043736968

   R: @robertwb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [yaml] add use-case example [beam]

2024-04-08 Thread via GitHub


Polber opened a new pull request, #30896:
URL: https://github.com/apache/beam/pull/30896

   Adds 2 simple use-case examples to examples catalog.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] add use-case example [beam]

2024-04-08 Thread via GitHub


Polber commented on PR #30896:
URL: https://github.com/apache/beam/pull/30896#issuecomment-2043759026

   R: @liferoad 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] add use-case example [beam]

2024-04-08 Thread via GitHub


Polber commented on PR #30896:
URL: https://github.com/apache/beam/pull/30896#issuecomment-2043759408

   @robertwb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] add use-case example [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30896:
URL: https://github.com/apache/beam/pull/30896#issuecomment-2043759925

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] add use-case example [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30896:
URL: https://github.com/apache/beam/pull/30896#discussion_r1556506443


##
sdks/python/apache_beam/yaml/examples/simple_filter.yaml:
##
@@ -0,0 +1,41 @@
+# coding=utf-8

Review Comment:
   are these yaml files packaged when releasing beam? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [yaml] remove Combine from yaml_experimental_features [beam]

2024-04-08 Thread via GitHub


Polber opened a new pull request, #30897:
URL: https://github.com/apache/beam/pull/30897

   Removes the need to set `--yaml_experimental_features=Combine` to run 
aggregations in Beam YAML
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Python] Add a couple quality-of-life improvemenets to `testing.util.assert_that` [beam]

2024-04-08 Thread via GitHub


tvalentyn commented on PR #30771:
URL: https://github.com/apache/beam/pull/30771#issuecomment-2043802447

   pydantic errors should be resolved now. ptal at linter errors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [yaml] Add Beam YAML Blog [beam]

2024-04-08 Thread via GitHub


Polber opened a new pull request, #30898:
URL: https://github.com/apache/beam/pull/30898

   Adds a Beam YAML announcement blog
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Revert "Revert #30533: Automatically execute unbounded pipelines in streaming mode." [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30894:
URL: https://github.com/apache/beam/pull/30894#discussion_r1556620836


##
sdks/python/apache_beam/runners/dataflow/dataflow_runner.py:
##
@@ -415,6 +416,12 @@ def run_pipeline(self, pipeline, options, 
pipeline_proto=None):
   self.proto_pipeline, self.proto_context = pipeline.to_runner_api(
   return_context=True, default_environment=self._default_environment)
 
+if any(pcoll.is_bounded == beam_runner_api_pb2.IsBounded.UNBOUNDED
+   for pcoll in self.proto_pipeline.components.pcollections.values()):
+  options.view_as(StandardOptions).streaming = True

Review Comment:
   if users specify streaming, can we honor the one specified by the users? 
This at least can let users keep the current behavior. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] Add Beam YAML Blog [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30898:
URL: https://github.com/apache/beam/pull/30898#issuecomment-2043944172

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [yaml] remove Combine from yaml_experimental_features [beam]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #30897:
URL: https://github.com/apache/beam/pull/30897#issuecomment-2043944210

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @jrmccluskey for label python.
   R: @liferoad for label website.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


jbsabbagh commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1555950658


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -191,6 +191,12 @@ private ReadFromKafkaDoFn(
 this.checkStopReadingFn = transform.getCheckStopReadingFn();
 this.badRecordRouter = transform.getBadRecordRouter();
 this.recordTag = recordTag;
+if (transform.getConsumerPollingTimeout() != null) {
+  this.consumerPollingTimeout =
+  
java.time.Duration.ofMillis(transform.getConsumerPollingTimeout().getMillis());
+} else {
+  this.consumerPollingTimeout = KAFKA_POLL_TIMEOUT;
+}

Review Comment:
   Maybe we could add some sort of log here that mentions what the timeout is? 
This will help signal to users that a timeout outside of the Kafka Consumer 
configs exists.



##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,7 +525,7 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
 return rawRecords;

Review Comment:
   In the case where the `rawRecords.isEmpty()`, might be worth adding some 
sort of count that tracks how many times `rawRecords` has been empty. If this 
count is too high, maybe emit a warning log that it is happening too often and 
that perhaps they should increase the poll timeout



##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##
@@ -587,6 +587,7 @@ public static  Read read() {
 .setCommitOffsetsInFinalizeEnabled(false)
 .setDynamicRead(false)
 .setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime())
+.setConsumerPollingTimeout(Duration.standardSeconds(1L))

Review Comment:
   Should the default be increased slightly? Maybe 2 seconds? Given that this 
is even causing issues in the US to US cross-regional reads, it does seem to 
suggest that 1 second might be a bit too aggressive as a general default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


liferoad commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1555976984


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##
@@ -587,6 +587,7 @@ public static  Read read() {
 .setCommitOffsetsInFinalizeEnabled(false)
 .setDynamicRead(false)
 .setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime())
+.setConsumerPollingTimeout(Duration.standardSeconds(1L))

Review Comment:
   This makes sense. @xianhualiu please update 
https://github.com/apache/beam/blob/master/CHANGES.md as well about your fix, 
since this should be called out in our release notes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


jbsabbagh commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1555959733


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,7 +525,7 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
 return rawRecords;

Review Comment:
   In the case where the `rawRecords.isEmpty()`, might be worth adding some 
sort of count that tracks how many times `rawRecords` has been empty. If this 
count is too high, maybe emit a warning log that it is happening too often and 
that perhaps the user should increase the poll timeout



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


xianhualiu commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556018716


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,7 +525,7 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
 return rawRecords;

Review Comment:
   added warning when no messages polled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


xianhualiu commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556019131


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java:
##
@@ -587,6 +587,7 @@ public static  Read read() {
 .setCommitOffsetsInFinalizeEnabled(false)
 .setDynamicRead(false)
 .setTimestampPolicyFactory(TimestampPolicyFactory.withProcessingTime())
+.setConsumerPollingTimeout(Duration.standardSeconds(1L))

Review Comment:
   increased to 2 seconds



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Patch release website changes [beam]

2024-04-08 Thread via GitHub


damccorm merged PR #30839:
URL: https://github.com/apache/beam/pull/30839


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Bug]: Cannot read from Kafka due to short poll timeout of consumer in KafkaIO [beam]

2024-04-08 Thread via GitHub


jbsabbagh commented on issue #30870:
URL: https://github.com/apache/beam/issues/30870#issuecomment-2042957025

   This also affects the Python SDK. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


jbsabbagh commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1555959733


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -518,7 +525,7 @@ private ConsumerRecords poll(
 return rawRecords;
   }
   elapsed = sw.elapsed();
-  if (elapsed.toMillis() >= KAFKA_POLL_TIMEOUT.toMillis()) {
+  if (elapsed.toMillis() >= consumerPollingTimeout.toMillis()) {
 // timeout is over
 return rawRecords;

Review Comment:
   In the case where the `rawRecords.isEmpty()` and the timeout is over, might 
be worth adding some sort of count that tracks how many times `rawRecords` has 
been empty. If this count is too high, maybe emit a warning log that it is 
happening too often and that perhaps the user should increase the poll timeout



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [bug30870]: make consumer polling timeout configurable for KafkaIO.Read [beam]

2024-04-08 Thread via GitHub


xianhualiu commented on code in PR #30877:
URL: https://github.com/apache/beam/pull/30877#discussion_r1556016519


##
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java:
##
@@ -191,6 +191,12 @@ private ReadFromKafkaDoFn(
 this.checkStopReadingFn = transform.getCheckStopReadingFn();
 this.badRecordRouter = transform.getBadRecordRouter();
 this.recordTag = recordTag;
+if (transform.getConsumerPollingTimeout() != null) {
+  this.consumerPollingTimeout =
+  
java.time.Duration.ofMillis(transform.getConsumerPollingTimeout().getMillis());
+} else {
+  this.consumerPollingTimeout = KAFKA_POLL_TIMEOUT;
+}

Review Comment:
   The KafkaIO.Read API doc already contains the java doc for this new 
configuration. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [Feature Request]: Managed transforms Java API [beam]

2024-04-08 Thread via GitHub


ahmedabu98 closed issue #30830: [Feature Request]: Managed transforms Java API
URL: https://github.com/apache/beam/issues/30830


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Clean doc related to write data in bigquery.py [beam]

2024-04-08 Thread via GitHub


kevinzous opened a new pull request, #30887:
URL: https://github.com/apache/beam/pull/30887

   * add missing closing parenthesis
   * add unique names to PTransform operations
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Add PubSubIO Stress test [beam]

2024-04-08 Thread via GitHub


akashorabek opened a new pull request, #30886:
URL: https://github.com/apache/beam/pull/30886

   This pull request introduces stress tests for PubSubIO, designed to assess 
the performance under various conditions. The stress tests simulate dynamic 
load increases and evaluate the behavior of PubSubIO.
   
   Changes:
   
   Added stress tests for PubSubIO.
   Implemented dynamic load increases over time to simulate varying workloads.
   Added support for exporting metrics to InfluxDB or BigQuery based on the 
configuration parameter.
   
   [Write job 
example](https://console.cloud.google.com/dataflow/jobs/us-central1/2024-04-08_02_09_14-10513489467703410835;step=Write%20to%20PubSub;mainTab=JOB_GRAPH;bottomTab=WORKER_LOGS;bottomStepTab=DATA_SAMPLING;logsSeverity=ERROR;graphView=0?project=apache-beam-testing)
   [Read job 
example](https://console.cloud.google.com/dataflow/jobs/us-central1/2024-04-08_02_09_57-14737073549262708363;step=Counting%20element;mainTab=JOB_GRAPH;bottomTab=JOB_LOGS;bottomStepTab=DATA_SAMPLING;logsSeverity=INFO;graphView=0?project=apache-beam-testing=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22)))
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #` instead.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   

   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Java] ManagedIO [beam]

2024-04-08 Thread via GitHub


ahmedabu98 merged PR #30808:
URL: https://github.com/apache/beam/pull/30808


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Duet AI Prompts - Documentation Lookup Without Links [beam]

2024-04-08 Thread via GitHub


andreydevyatkin commented on code in PR #30873:
URL: https://github.com/apache/beam/pull/30873#discussion_r1555991358


##
learning/prompts/documentation-lookup-nolinks/09_basic_triggers.md:
##
@@ -0,0 +1,30 @@
+Prompt:
+What is a trigger in Apache Beam?
+
+Response:
+Apache Beam uses triggers to determine when to emit the aggregated results of 
each window, referred to as a pane.
+
+Triggers provide two additional capabilities compared to outputting at the end 
of a window:
+- Allowing early results to be output before the end of the window.
+- Allowing late data to be handled after the end of the window.
+
+This feature lets you control the flow of your data and balance between 
completeness, latency, and cost.
+
+You set the triggers for a `PCollection` by setting the `trigger` parameter of 
the `WindowInto` transform.
+
+```python
+  pcollection | WindowInto(
+FixedWindows(1 * 60),
+trigger=AfterProcessingTime(1 * 60),
+accumulation_mode=AccumulationMode.DISCARDING)

Review Comment:
   ```suggestion
   pcollection | WindowInto(
   FixedWindows(1 * 60),
   trigger=AfterProcessingTime(1 * 60),
   accumulation_mode=AccumulationMode.DISCARDING,
   )
   ```



##
learning/prompts/documentation-lookup-nolinks/25_io_csv.md:
##
@@ -0,0 +1,29 @@
+Prompt:
+Is CSV format supported in Apache Beam?
+
+Response:
+CSV is a data format for storing and exchanging data. Apache Beam supports 
reading data from and writing data to CSV files using the `ReadFromText` and 
`WriteToText` transforms in the `TextIO` module. For more information, see the 
TextIO connector documentation for your programming language of choice.
+
+To get started with CSV and Apache Beam, refer to the Dataflow Cookbook GitHub 
repository.
+
+Here is an example of Apache Beam pipeline code for reading data from a CSV 
file:
+
+```python
+class CsvOptions(PipelineOptions):
+@classmethod
+def _add_argparse_args(cls, parser):
+parser.add_argument(
+'--file_path',
+default="gs://your-bucket/your-file.csv",
+help='Csv file path'
+)
+
+options = CsvOptions()
+
+with beam.Pipeline(options=options) as p:
+output = (p | "Read from Csv file" >> ReadFromCsv(
+path=options.file_path
+)
+| "Log Data" >> Map(logging.info))
+

Review Comment:
   ```suggestion
   class CsvOptions(PipelineOptions):
   @classmethod
   def _add_argparse_args(cls, parser):
   parser.add_argument(
   "--file_path",
   default="gs://your-bucket/your-file.csv",
   help="Csv file path",
   )
   
   
   options = CsvOptions()
   
   with beam.Pipeline(options=options) as p:
   output = (
   p
   | "Read from Csv file" >> ReadFromCsv(path=options.file_path)
   | "Log Data" >> Map(logging.info)
   )
   ```



##
learning/prompts/documentation-lookup-nolinks/20_io_biguery.md:
##
@@ -0,0 +1,41 @@
+Prompt:
+Is BigQuery supported in Apache Beam?
+
+Response:
+Yes, Apache Beam supports BigQuery. BigQuery is a serverless and 
cost-effective enterprise data warehouse offered by Google Cloud. Apache Beam 
provides a BigQueryIO connector to read and write data from and to BigQuery. 
The BigQueryIO connector supports both batch and streaming pipelines.
+
+The following Apache Beam SDKs support the BigQueryIO connector:
+* Java (natively)
+* Python (natively)
+* Go (natively and through X Language)
+* Typescript (through X Language)
+
+To read data from BigQuery, use the `ReadFromBigQuery` function. Apache Beam 
can read data directly from a BigQuery table or using an SQL query. The default 
mode is to return table rows read from a BigQuery source as dictionaries. 
Built-in `TableRow` objects can also be returned.
+
+Here is an example of Apache Beam pipeline code for reading from BigQuery:
+
+```python
+from apache_beam.io.gcp.bigquery import ReadFromBigQuery
+
+with beam.Pipeline(options=options) as p:
+  # read from a table
+lines_table = p | 'Read' >> ReadFromBigQuery(table=table)
+  # read from a query
+lines_query = p | 'Read' >> ReadFromBigQuery(query="SELECT * FROM table")
+```
+
+Here is an example of Apache Beam pipeline code for writing to BigQuery:
+
+```python
+from apache_beam.io.gcp.bigquery import WriteToBigQuery
+
+with beam.Pipeline(options=options) as p:
+  # write to a table
+p | 'Write' >> beam.io.WriteToBigQuery(
+table,
+schema=TABLE_SCHEMA,
+create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
+write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)

Review Comment:
   ```suggestion
   from apache_beam.io.gcp.bigquery import WriteToBigQuery
   
   with beam.Pipeline(options=options) as p:
   # write to a table
   p | "Write" >> beam.io.WriteToBigQuery(
   table,
   schema=TABLE_SCHEMA,