Re: [PR] Allow PartitionField's field_id is missing in Iceberg v1 [iceberg-cpp]

2025-06-12 Thread via GitHub
wgtmac commented on PR #121: URL: https://github.com/apache/iceberg-cpp/pull/121#issuecomment-2969245961 I believe @Fokko's idea has already been implemented as in https://github.com/apache/iceberg-cpp/blob/main/src/iceberg/json_internal.cc#L1058-L1061. Actually my intention is that `json_i

Re: [I] [DISCUSSION] How to implement manifest list or manifest read? [iceberg-cpp]

2025-06-12 Thread via GitHub
wgtmac commented on issue #122: URL: https://github.com/apache/iceberg-cpp/issues/122#issuecomment-2969226797 I plan to implement this after the Avro reader implementation is finished. I'm currently traveling in the US and will continue to work on it. -- This is an automated message from

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
dramaticlly commented on PR #12885: URL: https://github.com/apache/iceberg/pull/12885#issuecomment-2969160513 Thanks @szehon-ho for the detailed review and merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] rest catalog - fromProps places scope under Additional Properties [iceberg-go]

2025-06-12 Thread via GitHub
kris-gaudel commented on issue #451: URL: https://github.com/apache/iceberg-go/issues/451#issuecomment-2969028972 Hi is this issue still available? I'm looking to contribute to this project for the first time -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Remove usage of deprecated functions from the codebase [iceberg-python]

2025-06-12 Thread via GitHub
kris-gaudel commented on issue #1327: URL: https://github.com/apache/iceberg-python/issues/1327#issuecomment-2969002135 Hi there, I'm interested in contributing for the first time. Is this issue still relevant? If so, can someone assign this to me? -- This is an automated message from th

[PR] feat: support listing known catalogs [iceberg-python]

2025-06-12 Thread via GitHub
Anton-Tarazi opened a new pull request, #2088: URL: https://github.com/apache/iceberg-python/pull/2088 Adds a new function `pyiceberg.catalog.list_catalogs() -> List[str]` to list all known catalogs. # Rationale for this change In creating a pyiceberg-backed API

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on PR #12885: URL: https://github.com/apache/iceberg/pull/12885#issuecomment-2968939810 Merged, thanks @dramaticlly ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
szehon-ho merged PR #12885: URL: https://github.com/apache/iceberg/pull/12885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] Deleting with to_date() on partitioned column performs full-table deletion instead of partition-only metadata delete [iceberg]

2025-06-12 Thread via GitHub
madeirak commented on issue #13287: URL: https://github.com/apache/iceberg/issues/13287#issuecomment-2968906909 > I'm unable to repro this. I wrote a quick test here > > @TestTemplate > public void testDeleteFromDatePartitionedTable() throws NoSuchTableException { > sql(

Re: [PR] Support for TIME, TIMESTAMPNTZ_NANO, UUID types in Inclusive Metrics Evaluator [iceberg]

2025-06-12 Thread via GitHub
aihuaxu commented on code in PR #13195: URL: https://github.com/apache/iceberg/pull/13195#discussion_r2144098880 ## core/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluatorWithExtract.java: ## @@ -683,4 +784,1638 @@ public void testIntegerNotIn() {

Re: [PR] Core: Use time-travel schema when resolving partition spec in scan [iceberg]

2025-06-12 Thread via GitHub
chenjian2664 commented on code in PR #13301: URL: https://github.com/apache/iceberg/pull/13301#discussion_r2144080808 ## core/src/main/java/org/apache/iceberg/SnapshotScan.java: ## @@ -79,6 +79,22 @@ protected ScanMetrics scanMetrics() { return scanMetrics; } + protec

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2144040537 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,88 @@ public void testJoinsWithMismatchingPartiti

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-12 Thread via GitHub
himadripal commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2144036757 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/functions/BucketFunction.java: ## @@ -128,6 +133,23 @@ public String name() { public DataType resul

Re: [PR] Docs: Fix broken links in Flink Configuration documentation [iceberg]

2025-06-12 Thread via GitHub
KyleLin0927 commented on PR #13288: URL: https://github.com/apache/iceberg/pull/13288#issuecomment-2968651914 > confirmed that this works correctly thanks for your check! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Understand why SnapshotTableSparkAction disables gc.enabled by default [iceberg]

2025-06-12 Thread via GitHub
slfan1989 commented on issue #13300: URL: https://github.com/apache/iceberg/issues/13300#issuecomment-2968650617 @jkolash @RussellSpitzer Thank you for the explanation! I’ve understood your approach, and the feedback makes a lot of sense — it’s been very helpful. -- This is an automated m

Re: [I] Understand why SnapshotTableSparkAction disables gc.enabled by default [iceberg]

2025-06-12 Thread via GitHub
slfan1989 closed issue #13300: Understand why SnapshotTableSparkAction disables gc.enabled by default URL: https://github.com/apache/iceberg/issues/13300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2968643905 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TaskContextImpl.scala#L71 is currently a stack so that would need to change to something more like

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2968633573 Reading more deeply in DataSourceRDD.scala it seems the callback is there due to a partially consumed iterator, but spark wraps the raw iterator using the MetricsBatchIterator that

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2968588702 The issue is that coalesce is combining N tasks into 1 task but really there are N tasks, so coalesce should be more subtask aware? -- This is an automated message from the Apach

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2968557949 I wanted to bring up another point that @RussellSpitzer helped me identify which was this issue reproduces in a similar way with the v2 Datasources when I Ran https://github

Re: [PR] Docs: fix create nessie catalog in Flink [iceberg]

2025-06-12 Thread via GitHub
github-actions[bot] closed pull request #12978: Docs: fix create nessie catalog in Flink URL: https://github.com/apache/iceberg/pull/12978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Docs: fix create nessie catalog in Flink [iceberg]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #12978: URL: https://github.com/apache/iceberg/pull/12978#issuecomment-2968557184 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [I] [Arrow]Incorrect argument passed to setInitialCapacity in allocateVectorBasedOnTypeName method (should be count of values, not bytes) [iceberg]

2025-06-12 Thread via GitHub
github-actions[bot] commented on issue #11672: URL: https://github.com/apache/iceberg/issues/11672#issuecomment-2968556338 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [PR] Status: Split read/write and add deletion vectors [iceberg]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #12958: URL: https://github.com/apache/iceberg/pull/12958#issuecomment-2968556998 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Status: Split read/write and add deletion vectors [iceberg]

2025-06-12 Thread via GitHub
github-actions[bot] closed pull request #12958: Status: Split read/write and add deletion vectors URL: https://github.com/apache/iceberg/pull/12958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] [Arrow]Incorrect argument passed to setInitialCapacity in allocateVectorBasedOnTypeName method (should be count of values, not bytes) [iceberg]

2025-06-12 Thread via GitHub
github-actions[bot] closed issue #11672: [Arrow]Incorrect argument passed to setInitialCapacity in allocateVectorBasedOnTypeName method (should be count of values, not bytes) URL: https://github.com/apache/iceberg/issues/11672 -- This is an automated message from the Apache Git Service. To r

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
dramaticlly commented on code in PR #12885: URL: https://github.com/apache/iceberg/pull/12885#discussion_r2143828267 ## core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java: ## @@ -312,7 +314,87 @@ public static RewriteResult rewriteDataManifest( ManifestRead

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-12 Thread via GitHub
himadripal commented on PR #13167: URL: https://github.com/apache/iceberg/pull/13167#issuecomment-2968433309 @huaxingao and @szehon-ho please take another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2968429600 I dont see immediately a spark api to easily remove the callback once an iter is exhausted. Maybe the iceberg change is easier. but if you have some idea on spark side, i

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-12 Thread via GitHub
himadripal commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2143811428 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,88 @@ public void testJoinsWithMismatchingPartit

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-12 Thread via GitHub
himadripal commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2143811428 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,88 @@ public void testJoinsWithMismatchingPartit

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-12 Thread via GitHub
himadripal commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2143812151 ## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/functions/BucketFunction.java: ## @@ -128,6 +133,23 @@ public String name() { public DataType resul

Re: [PR] spark 4.0: SPJ: add bucket reducer using gcd [iceberg]

2025-06-12 Thread via GitHub
himadripal commented on code in PR #13167: URL: https://github.com/apache/iceberg/pull/13167#discussion_r2143811428 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java: ## @@ -549,6 +555,88 @@ public void testJoinsWithMismatchingPartit

Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

2025-06-12 Thread via GitHub
rodmeneses commented on code in PR #13260: URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143804676 ## flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java: ## @@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Excepti

Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

2025-06-12 Thread via GitHub
rodmeneses commented on code in PR #13260: URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143801248 ## flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java: ## @@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Excepti

Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

2025-06-12 Thread via GitHub
stevenzwu commented on code in PR #13260: URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143792082 ## flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java: ## @@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Exceptio

Re: [PR] Docs: Fix broken links in Flink Configuration documentation [iceberg]

2025-06-12 Thread via GitHub
stevenzwu commented on PR #13288: URL: https://github.com/apache/iceberg/pull/13288#issuecomment-2968396022 confirmed that this works correctly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

2025-06-12 Thread via GitHub
stevenzwu commented on code in PR #13260: URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143792082 ## flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java: ## @@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Exceptio

[PR] fix(cli/rest): Add custom scope for rest cli when using with Oauth [iceberg-go]

2025-06-12 Thread via GitHub
dttung2905 opened a new pull request, #461: URL: https://github.com/apache/iceberg-go/pull/461 @zeroshade I discover this when I play around with the CLI. It is useful when we interact with REST endpoint with Oauth capabilities ( Polaris for example ) -- This is an automated message fr

Re: [PR] fix: glue drop_namespace to check non-iceberg tables [iceberg-python]

2025-06-12 Thread via GitHub
kevinjqliu merged PR #2083: URL: https://github.com/apache/iceberg-python/pull/2083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

[PR] fix(table): handle missing or nil stats + metadata field nil comparison [iceberg-go]

2025-06-12 Thread via GitHub
James-Gilbert- opened a new pull request, #460: URL: https://github.com/apache/iceberg-go/pull/460 Handles a few potential nil dereferences -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Benchmarking suite [iceberg-rust]

2025-06-12 Thread via GitHub
kyteware commented on issue #1432: URL: https://github.com/apache/iceberg-rust/issues/1432#issuecomment-2968203727 @liurenjie1024 Sounds like a good approach, do you know what approach to doing that would be legally OK? I'm getting a 404 on their EULA, and I assume you know more about them

Re: [I] Move metadata into the catalog, like DuckLake [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on issue #13196: URL: https://github.com/apache/iceberg/issues/13196#issuecomment-2968162803 This is probably generically possible for any Rest Catalog implementation using the Scan API for reads on the client side. We can more easily support on the write side once

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2968134018 Ok - for my own recap here 1. Spark is holding onto task contexts in order to invoke callbacks to get metric information till the end of the job 2. Task Context through

Re: [PR] feat: implement remote signing transport for S3 requests and add tests [iceberg-go]

2025-06-12 Thread via GitHub
flarco commented on code in PR #458: URL: https://github.com/apache/iceberg-go/pull/458#discussion_r2143644068 ## io/s3.go: ## @@ -149,3 +201,175 @@ func createS3Bucket(ctx context.Context, parsed *url.URL, props map[string]strin return bucket, nil } + +// RemoteSign

Re: [PR] feat: implement remote signing transport for S3 requests and add tests [iceberg-go]

2025-06-12 Thread via GitHub
flarco commented on code in PR #458: URL: https://github.com/apache/iceberg-go/pull/458#discussion_r2143639347 ## io/s3.go: ## @@ -149,3 +201,175 @@ func createS3Bucket(ctx context.Context, parsed *url.URL, props map[string]strin return bucket, nil } + +// RemoteSign

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2967949805 > Or is the issue here that we are saving this task context for the UI so we don't actually ever drop the iterator reference and close is never called? The close() will not l

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2967979722 > I'm not sure what you mean by force V2 sources and try the Parquet version again. I was comparing both the v1 and v2 memory usage and the v2 path using iceberg consumes si

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2967929643 I'm not sure what you mean by force V2 sources and try the Parquet version again. I was comparing both the v1 and v2 memory usage and the v2 path using iceberg consumes significant

Re: [I] Deleting with to_date() on partitioned column performs full-table deletion instead of partition-only metadata delete [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on issue #13287: URL: https://github.com/apache/iceberg/issues/13287#issuecomment-2967942012 I'm unable to repro this. I wrote a quick test here ```java @TestTemplate public void testDeleteFromDatePartitionedTable() throws NoSuchTableException {

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
jkolash commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2967959316 Let me know if you want me to write something to produce synthetic data for this so it can be reproduced without using real data. That way you can reproduce it on your environment.

Re: [PR] docs: add `Transaction` example [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY commented on PR #1436: URL: https://github.com/apache/iceberg-rust/pull/1436#issuecomment-2967907296 Hi @jdockerty , thanks for this PR! I do think adding more documentation is necessary since iceberg-rs's tx semantics is slightly different from iceberg-java. There is an ongoing effort

Re: [PR] feat(transaction): Implement TransactionAction for updata_loc, update_props, and upgrade_format [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY commented on code in PR #1433: URL: https://github.com/apache/iceberg-rust/pull/1433#discussion_r2143427969 ## crates/iceberg/src/transaction/action.rs: ## @@ -35,6 +37,9 @@ pub type BoxedTransactionAction = Arc; /// to modify the table metadata. #[async_trait] pub(crate

Re: [I] OpenAPI spec for `TableRequirement` is missing `oneOf` list [iceberg]

2025-06-12 Thread via GitHub
Tishj closed issue #13305: OpenAPI spec for `TableRequirement` is missing `oneOf` list URL: https://github.com/apache/iceberg/issues/13305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] OpenAPI spec for `TableRequirement` is missing `oneOf` list [iceberg]

2025-06-12 Thread via GitHub
Tishj commented on issue #13305: URL: https://github.com/apache/iceberg/issues/13305#issuecomment-2967878272 I'll close this since even the OpenAPI spec has a similar structure as this in one of their examples It's under: > This example shows the allOf usage, which avoids needing t

Re: [PR] Materialized View Spec [iceberg]

2025-06-12 Thread via GitHub
stevenzwu commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r2143429478 ## format/view-spec.md: ## @@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another provides the basis for maki Writers create view metadata fil

Re: [I] [OpenAPI spec] `ReportMetricsRequest`, `report-type` provides no constants for the discriminator [iceberg]

2025-06-12 Thread via GitHub
Tishj closed issue #12696: [OpenAPI spec] `ReportMetricsRequest`, `report-type` provides no constants for the discriminator URL: https://github.com/apache/iceberg/issues/12696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat(transaction): Implement TransactionAction for updata_loc, update_props, and upgrade_format [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY commented on code in PR #1433: URL: https://github.com/apache/iceberg-rust/pull/1433#discussion_r2143455195 ## crates/iceberg/src/transaction/upgrade_format_version.rs: ## @@ -0,0 +1,133 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [I] Spark: Doing a Coalesce and foreachpartitions in spark directly on an iceberg table is leaking memory heavy iterators [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on issue #13297: URL: https://github.com/apache/iceberg/issues/13297#issuecomment-2967884665 Can your force V2 Sources and try the Parquet version again (FileScanRDD is a different code path)? It feels odd to me that the iterator we are making should null itself out

Re: [I] OpenAPI spec for `TableRequirement` is missing `oneOf` list [iceberg]

2025-06-12 Thread via GitHub
Tishj commented on issue #13305: URL: https://github.com/apache/iceberg/issues/13305#issuecomment-2967869733 Hmm, `ContentFile` has the same "issue" I wonder if I'm just misunderstanding something here ? The spec has an example that looks similar to what's done here ```yml com

Re: [I] OpenAPI spec for `TableRequirement` is missing `oneOf` list [iceberg]

2025-06-12 Thread via GitHub
Tishj commented on issue #13305: URL: https://github.com/apache/iceberg/issues/13305#issuecomment-2967846478 @Fokko I've made a diff, is that sufficient? ```patch diff --git a/open-api/rest-catalog-open-api.yaml b/open-api/rest-catalog-open-api.yaml index e9d5ab9a1..3c9ff747d 100644

Re: [PR] feat(transaction): Implement TransactionAction for updata_loc, update_props, and upgrade_format [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY commented on code in PR #1433: URL: https://github.com/apache/iceberg-rust/pull/1433#discussion_r2143397457 ## crates/iceberg/src/transaction/update_properties.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [I] OpenAPI spec for `TableRequirement` is missing `oneOf` list [iceberg]

2025-06-12 Thread via GitHub
Fokko commented on issue #13305: URL: https://github.com/apache/iceberg/issues/13305#issuecomment-2967826453 @Tishj Thanks for raising this. I find open-api always a bit confusing on this front. Could you open a PR to get a full overview of the changes? -- This is an automated message fro

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143382437 ## api/src/test/java/org/apache/iceberg/variants/VariantTestUtil.java: ## @@ -107,6 +121,15 @@ static SerializedPrimitive createString(String string) { re

Re: [I] OpenAPI spec for `TableRequirement` is missing `oneOf` list [iceberg]

2025-06-12 Thread via GitHub
Tishj commented on issue #13305: URL: https://github.com/apache/iceberg/issues/13305#issuecomment-2967806028 Ah.. looks like this isn't done because for whatever reason the candidates have `allOf` referencing the `TableRequirement` I think this should be solved the same way that it's solv

Re: [PR] feat(transaction): Implement TransactionAction for updata_loc, update_props, and upgrade_format [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY commented on code in PR #1433: URL: https://github.com/apache/iceberg-rust/pull/1433#discussion_r2143373042 ## crates/iceberg/src/transaction/mod.rs: ## @@ -178,27 +164,48 @@ impl Transaction { } } -/// Remove properties in table. -pub fn remove_prop

[I] Avoid Vec allocation in Transaction::do_commit [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY opened a new issue, #1437: URL: https://github.com/apache/iceberg-rust/issues/1437 As discussed here: https://github.com/apache/iceberg-rust/pull/1433#discussion_r2141327842 We should borrow actions in the vector instead of cloning the vector along with the removal of `updates`

Re: [PR] feat(transaction): Implement TransactionAction for updata_loc, update_props, and upgrade_format [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY commented on code in PR #1433: URL: https://github.com/apache/iceberg-rust/pull/1433#discussion_r2143366893 ## crates/iceberg/src/transaction/mod.rs: ## @@ -17,26 +17,31 @@ //! This module contains transaction api. -mod action; +/// The `ApplyTransactionAction` trait p

Re: [PR] feat(transaction): Implement TransactionAction for updata_loc, update_props, and upgrade_format [iceberg-rust]

2025-06-12 Thread via GitHub
CTTY commented on code in PR #1433: URL: https://github.com/apache/iceberg-rust/pull/1433#discussion_r2143366463 ## crates/iceberg/src/transaction/action.rs: ## @@ -16,6 +16,8 @@ // under the License. #![allow(dead_code)] + Review Comment: Will remove, thanks for the rem

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143358436 ## api/src/test/java/org/apache/iceberg/variants/VariantTestUtil.java: ## @@ -107,6 +121,15 @@ static SerializedPrimitive createString(String string) { re

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143358436 ## api/src/test/java/org/apache/iceberg/variants/VariantTestUtil.java: ## @@ -107,6 +121,15 @@ static SerializedPrimitive createString(String string) { re

[I] OpenAPI spec for `TableRequirement` is missing `oneOf` list [iceberg]

2025-06-12 Thread via GitHub
Tishj opened a new issue, #13305: URL: https://github.com/apache/iceberg/issues/13305 ### Apache Iceberg version 1.9.1 (latest release) ### Query engine None ### Please describe the bug 🐞 There is a `discriminator`, but no accompanying `oneOf` or `anyOf` dir

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143358436 ## api/src/test/java/org/apache/iceberg/variants/VariantTestUtil.java: ## @@ -107,6 +121,15 @@ static SerializedPrimitive createString(String string) { re

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143357715 ## api/src/test/java/org/apache/iceberg/variants/VariantTestUtil.java: ## @@ -107,6 +121,15 @@ static SerializedPrimitive createString(String string) { re

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143355126 ## api/src/test/java/org/apache/iceberg/variants/VariantTestUtil.java: ## @@ -107,6 +121,15 @@ static SerializedPrimitive createString(String string) { re

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143353476 ## api/src/test/java/org/apache/iceberg/variants/TestSerializedPrimitives.java: ## @@ -442,6 +442,7 @@ public void testString() { assertThat(value.type(

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143352474 ## api/src/test/java/org/apache/iceberg/variants/TestSerializedObject.java: ## @@ -182,70 +182,59 @@ public void testMixedValueTypes() { assertThat(actual

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143351294 ## api/src/test/java/org/apache/iceberg/variants/TestSerializedObject.java: ## @@ -182,70 +182,59 @@ public void testMixedValueTypes() { assertThat(actual

Re: [PR] Use short string in Variant when possible [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on code in PR #13284: URL: https://github.com/apache/iceberg/pull/13284#discussion_r2143350478 ## api/src/test/java/org/apache/iceberg/variants/TestSerializedObject.java: ## @@ -182,70 +182,59 @@ public void testMixedValueTypes() { assertThat(actual

Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

2025-06-12 Thread via GitHub
stevenzwu commented on code in PR #13260: URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143344460 ## flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java: ## @@ -388,6 +388,25 @@ void testErrorOnNullForRequiredField() throws Exceptio

Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

2025-06-12 Thread via GitHub
stevenzwu commented on code in PR #13260: URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143331582 ## flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java: ## @@ -831,16 +830,19 @@ private DataStream distributeDataStreamByHashDistributio

Re: [I] Understand why SnapshotTableSparkAction disables gc.enabled by default [iceberg]

2025-06-12 Thread via GitHub
RussellSpitzer commented on issue #13300: URL: https://github.com/apache/iceberg/issues/13300#issuecomment-2967734133 As @jkolash noted, GC.Enabled is set to false because the "snapshot" essentially does not own the data files it references. If it deletes them it may corrupt the table it wa

Re: [I] Fix links to API in Flink Configuration doc [iceberg]

2025-06-12 Thread via GitHub
stevenzwu closed issue #13285: Fix links to API in Flink Configuration doc URL: https://github.com/apache/iceberg/issues/13285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Docs: Fix broken links in Flink Configuration documentation [iceberg]

2025-06-12 Thread via GitHub
stevenzwu merged PR #13288: URL: https://github.com/apache/iceberg/pull/13288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Docs: Fix broken links in Flink Configuration documentation [iceberg]

2025-06-12 Thread via GitHub
stevenzwu commented on PR #13288: URL: https://github.com/apache/iceberg/pull/13288#issuecomment-2967726544 thanks @KyleLin0927 for the fix and @manuzhang for the review. I will merge it to give this a try. if it doesn't work, we can follow up again. -- This is an automated messag

Re: [PR] Support for TIME, TIMESTAMPNTZ_NANO, UUID types in Inclusive Metrics Evaluator [iceberg]

2025-06-12 Thread via GitHub
manirajv06 commented on code in PR #13195: URL: https://github.com/apache/iceberg/pull/13195#discussion_r2143237523 ## api/src/main/java/org/apache/iceberg/expressions/VariantExpressionUtil.java: ## @@ -40,29 +38,32 @@ class VariantExpressionUtil { .put(Types.DateType

Re: [PR] Support for TIME, TIMESTAMPNTZ_NANO, UUID types in Inclusive Metrics Evaluator [iceberg]

2025-06-12 Thread via GitHub
manirajv06 commented on code in PR #13195: URL: https://github.com/apache/iceberg/pull/13195#discussion_r2143223534 ## core/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluatorWithExtract.java: ## @@ -683,4 +784,1649 @@ public void testIntegerNotIn() {

Re: [PR] chore: remove non-test asserts [iceberg-python]

2025-06-12 Thread via GitHub
kevinjqliu merged PR #2082: URL: https://github.com/apache/iceberg-python/pull/2082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] Partition file filtering logic is incorrect for logical `not()` function [iceberg-rust]

2025-06-12 Thread via GitHub
ZENOTME commented on issue #1355: URL: https://github.com/apache/iceberg-rust/issues/1355#issuecomment-2967494138 > I haven't verified, but it does sound plausible to me that `rewrite_not` is always called before running `ManifestEvaluator::New`. > > If that were the case, I would be

Re: [PR] chore: remove non-test asserts [iceberg-python]

2025-06-12 Thread via GitHub
jayceslesar commented on PR #2082: URL: https://github.com/apache/iceberg-python/pull/2082#issuecomment-2967516083 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: glue drop_namespace to check non-iceberg tables [iceberg-python]

2025-06-12 Thread via GitHub
kevinjqliu commented on code in PR #2083: URL: https://github.com/apache/iceberg-python/pull/2083#discussion_r2143166743 ## pyiceberg/catalog/glue.py: ## @@ -680,13 +680,19 @@ def drop_namespace(self, namespace: Union[str, Identifier]) -> None: """ database_na

Re: [PR] fix: glue drop_namespace to check non-iceberg tables [iceberg-python]

2025-06-12 Thread via GitHub
kevinjqliu commented on code in PR #2083: URL: https://github.com/apache/iceberg-python/pull/2083#discussion_r2143166053 ## pyiceberg/catalog/glue.py: ## @@ -680,13 +680,19 @@ def drop_namespace(self, namespace: Union[str, Identifier]) -> None: """ database_na

Re: [I] 0 9.1 pypi cp311 Mac 11 arm package doesn't include merge to 0.9.1 version [iceberg-python]

2025-06-12 Thread via GitHub
kevinjqliu commented on issue #2087: URL: https://github.com/apache/iceberg-python/issues/2087#issuecomment-2967444094 Hey, it looks like #1923 was wrongly tagged with the 0.9.1 milestone, i've removed it Based on the commit, https://github.com/apache/iceberg-python/commit/831170d28

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on code in PR #12885: URL: https://github.com/apache/iceberg/pull/12885#discussion_r2143121846 ## core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java: ## @@ -333,6 +416,7 @@ public static RewriteResult rewriteDataManifest( */ public static

Re: [PR] feat: Declarative TableMetadata Builder [iceberg-rust]

2025-06-12 Thread via GitHub
c-thiel commented on code in PR #1362: URL: https://github.com/apache/iceberg-rust/pull/1362#discussion_r2143121852 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -101,7 +101,10 @@ pub const RESERVED_PROPERTIES: [&str; 9] = [ /// Reference to [`TableMetadata`]. pub type T

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on code in PR #12885: URL: https://github.com/apache/iceberg/pull/12885#discussion_r2143115914 ## core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java: ## @@ -312,7 +314,87 @@ public static RewriteResult rewriteDataManifest( ManifestReader

Re: [PR] feat: Declarative TableMetadata Builder [iceberg-rust]

2025-06-12 Thread via GitHub
c-thiel commented on code in PR #1362: URL: https://github.com/apache/iceberg-rust/pull/1362#discussion_r2143118998 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -101,7 +101,10 @@ pub const RESERVED_PROPERTIES: [&str; 9] = [ /// Reference to [`TableMetadata`]. pub type T

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on code in PR #12885: URL: https://github.com/apache/iceberg/pull/12885#discussion_r2143115914 ## core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java: ## @@ -312,7 +314,87 @@ public static RewriteResult rewriteDataManifest( ManifestReader

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on code in PR #12885: URL: https://github.com/apache/iceberg/pull/12885#discussion_r2143115914 ## core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java: ## @@ -312,7 +314,87 @@ public static RewriteResult rewriteDataManifest( ManifestReader

Re: [PR] Spark: RewriteTablePath: filter content files by snapshotId [iceberg]

2025-06-12 Thread via GitHub
szehon-ho commented on code in PR #12885: URL: https://github.com/apache/iceberg/pull/12885#discussion_r2143115914 ## core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java: ## @@ -312,7 +314,87 @@ public static RewriteResult rewriteDataManifest( ManifestReader

  1   2   >