Re: [I] Cannot cast java.util.UUID to java.lang.CharSequence [iceberg]

2025-08-07 Thread via GitHub
ebyhr commented on issue #13077: URL: https://github.com/apache/iceberg/issues/13077#issuecomment-3166631670 Can we reopen this issue because we reverted https://github.com/apache/iceberg/pull/13087 in https://github.com/apache/iceberg/pull/13754? -- This is an automated message from th

Re: [PR] feat(table): add fanout partition writer and rolling data writer [iceberg-go]

2025-08-07 Thread via GitHub
badalprasadsingh commented on code in PR #524: URL: https://github.com/apache/iceberg-go/pull/524#discussion_r2261968740 ## schema_conversions.go: ## @@ -30,33 +30,35 @@ func partitionTypeToAvroSchema(t *StructType) (avro.Schema, error) { var sc avro.Schema

Re: [PR] Core: Batch load new files when validating replaced partitions [iceberg]

2025-08-07 Thread via GitHub
gabeiglio commented on code in PR #13556: URL: https://github.com/apache/iceberg/pull/13556#discussion_r2261721256 ## core/src/main/java/org/apache/iceberg/CherryPickOperation.java: ## @@ -214,13 +220,17 @@ private static void validateReplacedPartitions( parentId == n

[PR] feat(datafusion): Add IcebergCommitExec to commit the written data files [iceberg-rust]

2025-08-07 Thread via GitHub
CTTY opened a new pull request, #1588: URL: https://github.com/apache/iceberg-rust/pull/1588 ## Which issue does this PR close? - Closes #1546 - Draft: #1511 ## What changes are included in this PR? - Added `IcebergCommitExec` to help commit the data files writte

Re: [PR] feat(catalog): Add catalog loader and builder implementation for rest catalog [iceberg-rust]

2025-08-07 Thread via GitHub
lliangyu-lin commented on code in PR #1580: URL: https://github.com/apache/iceberg-rust/pull/1580#discussion_r2261710250 ## crates/catalog/rest/src/catalog.rs: ## @@ -45,13 +46,18 @@ use crate::types::{ RegisterTableRequest, RenameTableRequest, }; +const REST_CATALOG_PRO

Re: [PR] feat: add bulk insertion to deletion vector [iceberg-rust]

2025-08-07 Thread via GitHub
liurenjie1024 merged PR #1578: URL: https://github.com/apache/iceberg-rust/pull/1578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] AWS: Support RangeReadable in Analytics Accelerator Stream [iceberg]

2025-08-07 Thread via GitHub
github-actions[bot] commented on PR #13361: URL: https://github.com/apache/iceberg/pull/13361#issuecomment-3166201054 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think thatโ€™s incorrect or this pul

Re: [PR] Encryption integration and test [iceberg]

2025-08-07 Thread via GitHub
github-actions[bot] commented on PR #13066: URL: https://github.com/apache/iceberg/pull/13066#issuecomment-3166200872 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think thatโ€™s incorrect or this pul

Re: [I] How to Pass two default catalog in spark for iceberg usecase [iceberg]

2025-08-07 Thread via GitHub
github-actions[bot] commented on issue #12203: URL: https://github.com/apache/iceberg/issues/12203#issuecomment-3166200759 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] CDC optimization [iceberg]

2025-08-07 Thread via GitHub
1raghavmahajan commented on issue #13094: URL: https://github.com/apache/iceberg/issues/13094#issuecomment-3166172762 If I understand you correctly, you only want to know which partitions have changed since a certain snapshot ID and you don't need the rows? -- This is an automated message

Re: [I] [Spark, PyIceberg] Different behavior in downcasting ns [iceberg]

2025-08-07 Thread via GitHub
JeonDaehong commented on issue #13679: URL: https://github.com/apache/iceberg/issues/13679#issuecomment-3166132355 Hi, I'm seeing issues when reading Parquet files with timestamp[ns]. PyIceberg downcasts to us, but Spark misinterprets the values and shows timestamps far in the future.

Re: [I] When writing data from a PyArrow DataFrame, how should we handle 'null' Fields? [iceberg-python]

2025-08-07 Thread via GitHub
kris-gaudel commented on issue #2119: URL: https://github.com/apache/iceberg-python/issues/2119#issuecomment-3166123762 Can I be assigned this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] refactor: consolidate snapshot expiration into MaintenanceTable [iceberg-python]

2025-08-07 Thread via GitHub
Fokko commented on code in PR #2143: URL: https://github.com/apache/iceberg-python/pull/2143#discussion_r2261329437 ## mkdocs/docs/api.md: ## @@ -995,513 +995,114 @@ readable_metrics: [ [6.0989]] ``` -!!! info -Content refers to type of content stored by the data file: `

Re: [PR] Spark 4.0: Add configuration to disable executor cache for delete files [iceberg]

2025-08-07 Thread via GitHub
anuragmantri commented on code in PR #12893: URL: https://github.com/apache/iceberg/pull/12893#discussion_r2261573161 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/TestSparkExecutorCache.java: ## @@ -349,6 +373,141 @@ private void checkMerge(RowLevelOperationMode mo

Re: [PR] Core: Support timestamp nanos as default values [iceberg]

2025-08-07 Thread via GitHub
ebyhr commented on PR #13487: URL: https://github.com/apache/iceberg/pull/13487#issuecomment-3166015066 @stevenzwu @nastra Gentle reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Spark 4.0: Add configuration to disable executor cache for delete files [iceberg]

2025-08-07 Thread via GitHub
ebyhr commented on code in PR #12893: URL: https://github.com/apache/iceberg/pull/12893#discussion_r2261475572 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/TestSparkExecutorCache.java: ## @@ -349,6 +373,141 @@ private void checkMerge(RowLevelOperationMode mode) th

Re: [PR] Spark 4.0: Add configuration to disable executor cache for delete files [iceberg]

2025-08-07 Thread via GitHub
ebyhr commented on code in PR #12893: URL: https://github.com/apache/iceberg/pull/12893#discussion_r2261475572 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/TestSparkExecutorCache.java: ## @@ -349,6 +373,141 @@ private void checkMerge(RowLevelOperationMode mode) th

Re: [PR] Core: Use ResourcePaths instead of hard-coded resource paths [iceberg]

2025-08-07 Thread via GitHub
amogh-jahagirdar merged PR #13759: URL: https://github.com/apache/iceberg/pull/13759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Core: Use ResourcePaths instead of hard-coded resource paths [iceberg]

2025-08-07 Thread via GitHub
amogh-jahagirdar commented on PR #13759: URL: https://github.com/apache/iceberg/pull/13759#issuecomment-3165724583 Thanks @nastra and thanks @singhpk234 for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Support reading ns from pyarrow [iceberg-python]

2025-08-07 Thread via GitHub
Fokko merged PR #2294: URL: https://github.com/apache/iceberg-python/pull/2294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] Improve timestamp support [iceberg-python]

2025-08-07 Thread via GitHub
Fokko closed issue #2270: Improve timestamp support URL: https://github.com/apache/iceberg-python/issues/2270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Include `initial-default` and `write-default` in repr [iceberg-python]

2025-08-07 Thread via GitHub
Fokko closed issue #1853: Include `initial-default` and `write-default` in repr URL: https://github.com/apache/iceberg-python/issues/1853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Change __repr__ method to only print if non-None [iceberg-python]

2025-08-07 Thread via GitHub
Fokko merged PR #2287: URL: https://github.com/apache/iceberg-python/pull/2287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] Feature request: add nightly for docs [iceberg-python]

2025-08-07 Thread via GitHub
rambleraptor commented on issue #2242: URL: https://github.com/apache/iceberg-python/issues/2242#issuecomment-3165676167 It seems like we should consider hosting docs for past versions, including the nightly build. There's a couple different ways we could handle this: 1. [mik

Re: [I] Expose a way for users to use a custom AuthManager when they define their REST Catalog [iceberg-python]

2025-08-07 Thread via GitHub
nvartolomei commented on issue #1960: URL: https://github.com/apache/iceberg-python/issues/1960#issuecomment-3165655273 @rambleraptor exciting. Thanks! ๐Ÿ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Expose a way for users to use a custom AuthManager when they define their REST Catalog [iceberg-python]

2025-08-07 Thread via GitHub
rambleraptor commented on issue #1960: URL: https://github.com/apache/iceberg-python/issues/1960#issuecomment-3165594246 Hello! We just merged in a Google AuthManager in #2072. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Support reading ns from pyarrow [iceberg-python]

2025-08-07 Thread via GitHub
rambleraptor commented on code in PR #2294: URL: https://github.com/apache/iceberg-python/pull/2294#discussion_r2261318756 ## pyiceberg/io/pyarrow.py: ## @@ -1288,6 +1306,11 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: elif primitive.unit == "ns

Re: [PR] Support reading ns from pyarrow [iceberg-python]

2025-08-07 Thread via GitHub
rambleraptor commented on PR #2294: URL: https://github.com/apache/iceberg-python/pull/2294#issuecomment-3165591686 @Fokko responded. thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Cannot query Iceberg table in an Azure Storage Account [iceberg]

2025-08-07 Thread via GitHub
KalyanKadiyala commented on issue #13760: URL: https://github.com/apache/iceberg/issues/13760#issuecomment-3165574340 Hi @murphycrosby - could you please upload Spark driver logs stripping any sensitive data in it? Thanks, Kalyan -- This is an automated message from the Apache

Re: [PR] fix: sanitize invalid Avro field names in manifest file [iceberg-python]

2025-08-07 Thread via GitHub
kris-gaudel commented on code in PR #2245: URL: https://github.com/apache/iceberg-python/pull/2245#discussion_r2261291351 ## pyiceberg/utils/schema_conversion.py: ## @@ -524,12 +524,19 @@ def field(self, field: NestedField, field_result: AvroType) -> AvroType: if isins

Re: [I] Duplicate file path found in Iceberg metadata snapshot [iceberg]

2025-08-07 Thread via GitHub
yogevyuval commented on issue #13763: URL: https://github.com/apache/iceberg/issues/13763#issuecomment-3165547838 @hguercan This potentially is an issue with the sink connector. One validation to do is to figure out in which snapshot did this file originate (was it append or replace), to id

[I] Duplicate file path found in Iceberg metadata snapshot [iceberg]

2025-08-07 Thread via GitHub
hguercan opened a new issue, #13763: URL: https://github.com/apache/iceberg/issues/13763 ### Apache Iceberg version None ### Query engine None ### Please describe the bug ๐Ÿž Hello everyone, Our Architecture is to stream data via kafka and ingest the d

Re: [PR] API: Define RepairManifests action interface [iceberg]

2025-08-07 Thread via GitHub
amogh-jahagirdar commented on PR #10784: URL: https://github.com/apache/iceberg/pull/10784#issuecomment-3165516289 Ok this has been open for quite some time, and I haven't been able to follow through with this due to bandwidth but now I'm going to make time to follow through on this since I

Re: [PR] Introduce MetricsMaxInferredColumnDefaultsStrategy [iceberg]

2025-08-07 Thread via GitHub
rdblue commented on PR #13039: URL: https://github.com/apache/iceberg/pull/13039#issuecomment-3165470011 Merged. Thanks, @jkolash! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Introduce MetricsMaxInferredColumnDefaultsStrategy [iceberg]

2025-08-07 Thread via GitHub
rdblue merged PR #13039: URL: https://github.com/apache/iceberg/pull/13039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Minor cleanup [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu merged PR #2298: URL: https://github.com/apache/iceberg-python/pull/2298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Minor cleanup [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on PR #2298: URL: https://github.com/apache/iceberg-python/pull/2298#issuecomment-3165418961 nice! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Minor cleanup [iceberg-python]

2025-08-07 Thread via GitHub
Fokko opened a new pull request, #2298: URL: https://github.com/apache/iceberg-python/pull/2298 # Rationale for this change Missed this in another PR # Are these changes tested? # Are there any user-facing changes? -- This is an automated message from the A

Re: [PR] Basic read/write support for ORC [iceberg-python]

2025-08-07 Thread via GitHub
mccormickt12 commented on PR #2236: URL: https://github.com/apache/iceberg-python/pull/2236#issuecomment-3165384583 > im trying to finish up some of the remaining items so we can release 0.10, https://github.com/apache/iceberg-python/milestone/10 > > will take a look at this PR right

Re: [I] Corrupted table history due to CREATE OR REPLACE TABLE & Snapshot Expiration at the same time [iceberg]

2025-08-07 Thread via GitHub
RussellSpitzer commented on issue #13651: URL: https://github.com/apache/iceberg/issues/13651#issuecomment-3165361320 I think this was mentioned on the mailing list as well, but yes we definitely need to fix this. Even if we do change the behavior in the future that won't be for a long time

Re: [PR] Enable add tests migrated Hive tables [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu merged PR #2295: URL: https://github.com/apache/iceberg-python/pull/2295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] feat(table): Support Dynamic Partition Overwrite [iceberg-go]

2025-08-07 Thread via GitHub
lliangyu-lin commented on code in PR #482: URL: https://github.com/apache/iceberg-go/pull/482#discussion_r2261145024 ## table/transaction.go: ## @@ -343,6 +345,250 @@ func (t *Transaction) AddFiles(ctx context.Context, files []string, snapshotProp return t.apply(updates

Re: [PR] feat(table): Support Dynamic Partition Overwrite [iceberg-go]

2025-08-07 Thread via GitHub
lliangyu-lin commented on code in PR #482: URL: https://github.com/apache/iceberg-go/pull/482#discussion_r2261145024 ## table/transaction.go: ## @@ -343,6 +345,250 @@ func (t *Transaction) AddFiles(ctx context.Context, files []string, snapshotProp return t.apply(updates

Re: [PR] Bump Poetry to 2.1.4 [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu merged PR #2297: URL: https://github.com/apache/iceberg-python/pull/2297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] feat(table): Support Dynamic Partition Overwrite [iceberg-go]

2025-08-07 Thread via GitHub
lliangyu-lin commented on code in PR #482: URL: https://github.com/apache/iceberg-go/pull/482#discussion_r2261145024 ## table/transaction.go: ## @@ -343,6 +345,250 @@ func (t *Transaction) AddFiles(ctx context.Context, files []string, snapshotProp return t.apply(updates

Re: [I] Implement partition writer [iceberg-rust]

2025-08-07 Thread via GitHub
Kyle-Cooley commented on issue #342: URL: https://github.com/apache/iceberg-rust/issues/342#issuecomment-3165278271 This feature seems to be stuck on a review for https://github.com/apache/iceberg-rust/pull/1040. Is there anyone we can reach out to for that? -- This is an automated messa

Re: [PR] Spark 4.0: Add configuration to disable executor cache for delete files [iceberg]

2025-08-07 Thread via GitHub
anuragmantri commented on code in PR #12893: URL: https://github.com/apache/iceberg/pull/12893#discussion_r2261117031 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/TestSparkExecutorCache.java: ## @@ -181,6 +181,30 @@ public void testMaxTotalSizeConfig() { }

[I] Avro: Allow reading ManifestList V1 using a V2 reader [iceberg-rust]

2025-08-07 Thread via GitHub
Fokko opened a new issue, #1587: URL: https://github.com/apache/iceberg-rust/issues/1587 ### Is your feature request related to a problem or challenge? For reading Manifest/ManifestList using PyIceberg we want to have the interface as simple as possible. Therefore we want to enable re

Re: [PR] Fix filesystem [iceberg-python]

2025-08-07 Thread via GitHub
mccormickt12 commented on code in PR #2291: URL: https://github.com/apache/iceberg-python/pull/2291#discussion_r2261093449 ## pyiceberg/io/pyarrow.py: ## @@ -381,21 +381,38 @@ def to_input_file(self) -> PyArrowFile: class PyArrowFileIO(FileIO): fs_by_scheme: Callable[[st

[PR] Bump Poetry to 2.1.4 [iceberg-python]

2025-08-07 Thread via GitHub
Fokko opened a new pull request, #2297: URL: https://github.com/apache/iceberg-python/pull/2297 # Rationale for this change # Are these changes tested? # Are there any user-facing changes? -- This is an automated message from the Apache Git Ser

Re: [PR] Fix filesystem [iceberg-python]

2025-08-07 Thread via GitHub
mccormickt12 commented on PR #2291: URL: https://github.com/apache/iceberg-python/pull/2291#issuecomment-3165111733 this shows that setting the netloc on filesystem creation and having it in the path (as is done for the other fs types) doesn't work for hdfs ``` >>> hdfs = fs.Hado

Re: [PR] fix: sanitize invalid Avro field names in manifest file [iceberg-python]

2025-08-07 Thread via GitHub
Fokko commented on code in PR #2245: URL: https://github.com/apache/iceberg-python/pull/2245#discussion_r2261041619 ## pyiceberg/utils/schema_conversion.py: ## @@ -524,12 +524,19 @@ def field(self, field: NestedField, field_result: AvroType) -> AvroType: if isinstance(

Re: [PR] Pass in type explicitly for `initial-default` [iceberg-python]

2025-08-07 Thread via GitHub
Fokko merged PR #2296: URL: https://github.com/apache/iceberg-python/pull/2296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Support reading ns from pyarrow [iceberg-python]

2025-08-07 Thread via GitHub
Fokko commented on code in PR #2294: URL: https://github.com/apache/iceberg-python/pull/2294#discussion_r2261029419 ## pyiceberg/io/pyarrow.py: ## @@ -1288,6 +1306,11 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: elif primitive.unit == "ns":

Re: [PR] Left-Hand Nav tweaks [iceberg]

2025-08-07 Thread via GitHub
rmoff commented on code in PR #13753: URL: https://github.com/apache/iceberg/pull/13753#discussion_r2260908349 ## site/nav.yml: ## @@ -21,6 +21,20 @@ nav: - Spark: spark-quickstart.md - Hive: hive-quickstart.md - Docs: +- introduction.md +- Concepts: +

Re: [PR] feat(name_mapping): Add UpdateNameMapping function [iceberg-go]

2025-08-07 Thread via GitHub
zeroshade merged PR #521: URL: https://github.com/apache/iceberg-go/pull/521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Left-Hand Nav tweaks [iceberg]

2025-08-07 Thread via GitHub
rmoff commented on code in PR #13753: URL: https://github.com/apache/iceberg/pull/13753#discussion_r2260908349 ## site/nav.yml: ## @@ -21,6 +21,20 @@ nav: - Spark: spark-quickstart.md - Hive: hive-quickstart.md - Docs: +- introduction.md +- Concepts: +

[PR] REST spec: Correct type annotation in TableMetadat#encryption-keys field [iceberg]

2025-08-07 Thread via GitHub
blakesmith opened a new pull request, #13762: URL: https://github.com/apache/iceberg/pull/13762 The `TableMetadata#encryption-keys` field from the REST spec seems to be accidentally setting the type to `list`, rather than an `array`. As far as I can tell, swagger only supports an [`array`

Re: [I] [feature request] disallow creating partition field with name that conflicts with schema field when its not identity transform [iceberg-python]

2025-08-07 Thread via GitHub
rutb327 commented on issue #2272: URL: https://github.com/apache/iceberg-python/issues/2272#issuecomment-3165049377 I would like to work on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Support reading ns from pyarrow [iceberg-python]

2025-08-07 Thread via GitHub
rambleraptor commented on code in PR #2294: URL: https://github.com/apache/iceberg-python/pull/2294#discussion_r2260897715 ## pyiceberg/io/pyarrow.py: ## @@ -,6 +1126,8 @@ def _(obj: pa.Field, visitor: PyArrowSchemaVisitor[T]) -> T: visitor.before_field(obj) try

Re: [PR] Left-Hand Nav tweaks [iceberg]

2025-08-07 Thread via GitHub
rmoff commented on PR #13753: URL: https://github.com/apache/iceberg/pull/13753#issuecomment-3165023232 > @rmoff can you please elaborate why files are being moved around? For example, `view-configuration.md` is moved from `docs/docs/view-configuration.md` to `site/docs/concepts/view-confi

Re: [PR] Left-Hand Nav tweaks [iceberg]

2025-08-07 Thread via GitHub
nastra commented on PR #13753: URL: https://github.com/apache/iceberg/pull/13753#issuecomment-3164964268 @rmoff can you please elaborate why files are being moved around? For example, `view-configuration.md` is moved from `docs/docs/view-configuration.md` to `site/docs/concepts/view-confi

Re: [PR] Left-Hand Nav tweaks [iceberg]

2025-08-07 Thread via GitHub
nastra commented on code in PR #13753: URL: https://github.com/apache/iceberg/pull/13753#discussion_r2260852581 ## site/nav.yml: ## @@ -21,6 +21,20 @@ nav: - Spark: spark-quickstart.md - Hive: hive-quickstart.md - Docs: +- introduction.md +- Concepts: +

Re: [PR] Left-Hand Nav tweaks [iceberg]

2025-08-07 Thread via GitHub
nastra commented on code in PR #13753: URL: https://github.com/apache/iceberg/pull/13753#discussion_r2260852581 ## site/nav.yml: ## @@ -21,6 +21,20 @@ nav: - Spark: spark-quickstart.md - Hive: hive-quickstart.md - Docs: +- introduction.md +- Concepts: +

Re: [PR] Core: Remove redundant V2TableTestBase [iceberg]

2025-08-07 Thread via GitHub
nastra merged PR #13757: URL: https://github.com/apache/iceberg/pull/13757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Compulsorily close coordinator if task is stopped by the connect framework. [iceberg]

2025-08-07 Thread via GitHub
kumarpritam863 commented on PR #13756: URL: https://github.com/apache/iceberg/pull/13756#issuecomment-3164910528 @bryanck can you please review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Spark: Avoid closing deserialized copies of shared resources like FileIO [iceberg]

2025-08-07 Thread via GitHub
danielcweeks commented on PR #12868: URL: https://github.com/apache/iceberg/pull/12868#issuecomment-3164904270 I don't think this is necessarily the right way to tackle this problem. As @nastra pointed out, this is somewhat specific to the S3FileIO implementation (and other implementations

Re: [PR] Spark 4.0: Make `SparkBatch.createReaderFactory` customizable [iceberg]

2025-08-07 Thread via GitHub
zhztheplayer closed pull request #13433: Spark 4.0: Make `SparkBatch.createReaderFactory` customizable URL: https://github.com/apache/iceberg/pull/13433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Basic read/write support for ORC [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on PR #2236: URL: https://github.com/apache/iceberg-python/pull/2236#issuecomment-3164754566 im trying to finish up some of the remaining items so we can release 0.10, https://github.com/apache/iceberg-python/milestone/10 will take a look at this PR right after :)

Re: [PR] Fix filesystem [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on code in PR #2291: URL: https://github.com/apache/iceberg-python/pull/2291#discussion_r2260706438 ## pyiceberg/io/pyarrow.py: ## @@ -381,21 +381,38 @@ def to_input_file(self) -> PyArrowFile: class PyArrowFileIO(FileIO): fs_by_scheme: Callable[[str,

Re: [PR] Fix filesystem [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on code in PR #2291: URL: https://github.com/apache/iceberg-python/pull/2291#discussion_r2260704048 ## pyiceberg/io/pyarrow.py: ## @@ -381,21 +381,38 @@ def to_input_file(self) -> PyArrowFile: class PyArrowFileIO(FileIO): fs_by_scheme: Callable[[str,

Re: [PR] Delete unused ParquetAvroReader [iceberg]

2025-08-07 Thread via GitHub
anoopj commented on code in PR #13755: URL: https://github.com/apache/iceberg/pull/13755#discussion_r2260667560 ## .palantir/revapi.yml: ## @@ -1333,6 +1333,9 @@ acceptedBreaks: \ java.util.List)" justification: "Removing deprecations for 1.10.0" org.apache.

Re: [PR] Delete unused ParquetAvroReader [iceberg]

2025-08-07 Thread via GitHub
anoopj closed pull request #13755: Delete unused ParquetAvroReader URL: https://github.com/apache/iceberg/pull/13755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Expose a way for users to use a custom AuthManager when they define their REST Catalog [iceberg-python]

2025-08-07 Thread via GitHub
nvartolomei commented on issue #1960: URL: https://github.com/apache/iceberg-python/issues/1960#issuecomment-3164658244 Hi! Any progress on this? I want to use iceberg-python with BigLake Metastore (https://cloud.google.com/bigquery/docs/blms-rest-catalog) and looks like it needs custom au

[I] [docs] Orphaned content - Migration docs [iceberg]

2025-08-07 Thread via GitHub
rmoff opened a new issue, #13761: URL: https://github.com/apache/iceberg/issues/13761 It looks like the Migration nav entry got dropped from the nav between 1.4.3 and 1.5.0. Looks like it maybe just got missed in the migration from Hugo? [https://github.com/apache/iceberg/commit/ed288987

Re: [PR] Enable add tests migrated Hive tables [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on code in PR #2295: URL: https://github.com/apache/iceberg-python/pull/2295#discussion_r2260614541 ## pyiceberg/expressions/visitors.py: ## @@ -940,7 +936,7 @@ def visit_bound_predicate(self, predicate: BoundPredicate[L]) -> BooleanExpressi def transl

Re: [I] [bug] Schema validation should reject field names that are invalid Avro identifiers [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on issue #2123: URL: https://github.com/apache/iceberg-python/issues/2123#issuecomment-3164573459 thanks for the interest @kris-gaudel. There are a lot of opportunities to help. Heres one I just found https://github.com/apache/iceberg-python/issues/2119. Please feel fr

Re: [I] Decimal unscale fails with empty column [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on issue #2263: URL: https://github.com/apache/iceberg-python/issues/2263#issuecomment-3164563606 thank you @berg2043 heres a recent PR that is similar to the change above https://github.com/apache/iceberg-python/commit/1a5e32ab234ed180b4a2dadb4e8399de4a39ab2f

[I] Cannot query Iceberg table in an Azure Storage Account [iceberg]

2025-08-07 Thread via GitHub
murphycrosby opened a new issue, #13760: URL: https://github.com/apache/iceberg/issues/13760 ### Apache Iceberg version 1.9.2 (latest release) ### Query engine Other ### Please describe the bug ๐Ÿž In Databricks, running the below gives the error [[TA

Re: [PR] fix: sanitize invalid Avro field names in manifest file [iceberg-python]

2025-08-07 Thread via GitHub
kevinjqliu commented on code in PR #2245: URL: https://github.com/apache/iceberg-python/pull/2245#discussion_r2260594293 ## pyiceberg/utils/schema_conversion.py: ## @@ -524,12 +524,19 @@ def field(self, field: NestedField, field_result: AvroType) -> AvroType: if isinst

Re: [I] Duplicate file name in Iceberg's metadata [iceberg]

2025-08-07 Thread via GitHub
hguercan commented on issue #8953: URL: https://github.com/apache/iceberg/issues/8953#issuecomment-3164468898 Hello we are seeing the same issue now with newer versions also. We are using iceberg 1.9.2 and stream-writing via kafka connect and doing maintenance (data-compaction, manif

Re: [PR] Left-Hand Nav tweaks [iceberg]

2025-08-07 Thread via GitHub
rmoff commented on code in PR #13753: URL: https://github.com/apache/iceberg/pull/13753#discussion_r2260515392 ## site/nav.yml: ## @@ -21,6 +21,20 @@ nav: - Spark: spark-quickstart.md - Hive: hive-quickstart.md - Docs: +- introduction.md +- Concepts: +

Re: [PR] Support reading ns from pyarrow [iceberg-python]

2025-08-07 Thread via GitHub
Fokko commented on code in PR #2294: URL: https://github.com/apache/iceberg-python/pull/2294#discussion_r2260500394 ## pyiceberg/io/pyarrow.py: ## @@ -1288,6 +1308,11 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: elif primitive.unit == "ns":

Re: [PR] Support reading ns from pyarrow [iceberg-python]

2025-08-07 Thread via GitHub
Fokko commented on code in PR #2294: URL: https://github.com/apache/iceberg-python/pull/2294#discussion_r2260499266 ## pyiceberg/io/pyarrow.py: ## @@ -,6 +1126,8 @@ def _(obj: pa.Field, visitor: PyArrowSchemaVisitor[T]) -> T: visitor.before_field(obj) try: +

[PR] Core: Use ResourcePaths instead of hard-coded resource paths [iceberg]

2025-08-07 Thread via GitHub
nastra opened a new pull request, #13759: URL: https://github.com/apache/iceberg/pull/13759 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] Pass in type explicitly for `initial-default` [iceberg-python]

2025-08-07 Thread via GitHub
Fokko opened a new pull request, #2296: URL: https://github.com/apache/iceberg-python/pull/2296 # Rationale for this change I noticed we just passed in the value, without setting the type explicitly. By default, PyArrow will for example upscale 1 to an int64 field, while the column i

Re: [PR] feat: implement Literal Transform [iceberg-cpp]

2025-08-07 Thread via GitHub
zhjwpku commented on code in PR #156: URL: https://github.com/apache/iceberg-cpp/pull/156#discussion_r2260487542 ## src/iceberg/transform_function.h: ## @@ -97,11 +97,11 @@ class YearTransform : public TransformFunction { /// \param source_type Must be a timestamp type. ex

Re: [PR] feat: implement Literal Transform [iceberg-cpp]

2025-08-07 Thread via GitHub
zhjwpku commented on code in PR #156: URL: https://github.com/apache/iceberg-cpp/pull/156#discussion_r2260487542 ## src/iceberg/transform_function.h: ## @@ -97,11 +97,11 @@ class YearTransform : public TransformFunction { /// \param source_type Must be a timestamp type. ex

Re: [PR] Convert `_get_column_projection_values` to use Field-IDs [iceberg-python]

2025-08-07 Thread via GitHub
Fokko commented on PR #2293: URL: https://github.com/apache/iceberg-python/pull/2293#issuecomment-3164393862 I've created the follow-up PR here: https://github.com/apache/iceberg-python/pull/2295 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[PR] Enable add tests migrated Hive tables [iceberg-python]

2025-08-07 Thread via GitHub
Fokko opened a new pull request, #2295: URL: https://github.com/apache/iceberg-python/pull/2295 # Rationale for this change # Are these changes tested? # Are there any user-facing changes? -- This is an automated message from the Apache Git Ser

Re: [I] TaskMemoryManager: Failed to allocate a page while performing expireSnapshots SparkAction [iceberg]

2025-08-07 Thread via GitHub
BharadwajaD closed issue #13737: TaskMemoryManager: Failed to allocate a page while performing expireSnapshots SparkAction URL: https://github.com/apache/iceberg/issues/13737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] TaskMemoryManager: Failed to allocate a page while performing expireSnapshots SparkAction [iceberg]

2025-08-07 Thread via GitHub
BharadwajaD commented on issue #13737: URL: https://github.com/apache/iceberg/issues/13737#issuecomment-3164379347 Thank you @RussellSpitzer and @amogh-jahagirdar !! Closing this thread :) -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] feat: add bulk insertion to deletion vector [iceberg-rust]

2025-08-07 Thread via GitHub
dentiny commented on PR #1578: URL: https://github.com/apache/iceberg-rust/pull/1578#issuecomment-3164373889 > Just one nit about error handling. Thank you so much for the careful review! I learnt a lot. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] removing orphan files, when using S3, could potentially be done via S3 Lifecycles? [iceberg]

2025-08-07 Thread via GitHub
RussellSpitzer commented on issue #13693: URL: https://github.com/apache/iceberg/issues/13693#issuecomment-3164360223 I don't think the "delete" portion is generally that slow since with the bulk apis we now have a single thread can usually issue all the deletes relatively quickly. I do kno

Re: [I] TaskMemoryManager: Failed to allocate a page while performing expireSnapshots SparkAction [iceberg]

2025-08-07 Thread via GitHub
RussellSpitzer commented on issue #13737: URL: https://github.com/apache/iceberg/issues/13737#issuecomment-3164330236 The Spark Sql Call just calls the Java Api :). It was most likely just some OOM sort of issue. The Broadcast requires keeping a copy of the shipped data on the driver and ex

Re: [PR] fix: correct return type of ArrowFileSystemFileIO functions( MakeMockFileIO, MakeLocalFileIO ) [iceberg-cpp]

2025-08-07 Thread via GitHub
Fokko merged PR #161: URL: https://github.com/apache/iceberg-cpp/pull/161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [I] Kafka Connect drop messages when Glue Exception is happening [iceberg]

2025-08-07 Thread via GitHub
michalantkowicz commented on issue #13752: URL: https://github.com/apache/iceberg/issues/13752#issuecomment-3164311671 Sure thing! Thank you very much @nferrario ! ๐Ÿ™Œ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Spark: RewriteTablePath: Update sizes of rewritten manifests in manifest lists [iceberg]

2025-08-07 Thread via GitHub
vaultah commented on PR #13720: URL: https://github.com/apache/iceberg/pull/13720#issuecomment-3164284391 Also it seems problematic that IO exceptions are skipped [here](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java#L284-L286). I

Re: [PR] fix: sanitize invalid Avro field names in manifest file [iceberg-python]

2025-08-07 Thread via GitHub
nvartolomei commented on code in PR #2245: URL: https://github.com/apache/iceberg-python/pull/2245#discussion_r2260340907 ## pyiceberg/utils/schema_conversion.py: ## @@ -524,12 +524,19 @@ def field(self, field: NestedField, field_result: AvroType) -> AvroType: if isins

Re: [I] removing orphan files, when using S3, could potentially be done via S3 Lifecycles? [iceberg]

2025-08-07 Thread via GitHub
jkolash commented on issue #13693: URL: https://github.com/apache/iceberg/issues/13693#issuecomment-3164218944 You can implement your own delete function in the existing apis. ```java /** * Passes an alternative delete implementation that will be used for orphan files. *

  1   2   >