[GitHub] [iceberg] jackye1995 commented on issue #6632: Bug with Branch Transactions
jackye1995 commented on issue #6632: URL: https://github.com/apache/iceberg/issues/6632#issuecomment-1398734413 Sure, assigned! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6631: Flink: backport PR #6584 to 1.14 and 1.15 for Avro GenericRecord in FLIP-27 source
stevenzwu commented on code in PR #6631: URL: https://github.com/apache/iceberg/pull/6631#discussion_r1082878466 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestRowDataToAvroGenericRecordConverter.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.flink.source; Review Comment: this is the package name fix in 1.16 module. it should be under `flink/source` (not flink/`) folder -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on issue #6632: Bug with Branch Transactions
amogh-jahagirdar commented on issue #6632: URL: https://github.com/apache/iceberg/issues/6632#issuecomment-1398725654 I'm working on a fix for this @jackye1995 could you assign this to me? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar opened a new issue, #6632: Bug with Branch Transactions
amogh-jahagirdar opened a new issue, #6632: URL: https://github.com/apache/iceberg/issues/6632 ### Apache Iceberg version 1.1.0 (latest release) ### Query engine None ### Please describe the bug Creating this issue for awareness, was discussing with @rdblue seems like there's a bug in https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L415 when committing the transaction, currentSnapshotId is set to current's main snapshot id but this is incorrect if a transaction is being committed to a branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] stevenzwu commented on pull request #6631: Flink: backport PR #6584 to 1.14 and 1.15 for Avro GenericRecord in FLIP-27 source
stevenzwu commented on PR #6631: URL: https://github.com/apache/iceberg/pull/6631#issuecomment-1398722055 I checked the following diff and found nothing related to the classes touched by PR #6584 ``` git diff --no-index flink/v1.14/flink/src/ flink/v1.16/flink/src git diff --no-index flink/v1.15/flink/src/ flink/v1.16/flink/src ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] stevenzwu opened a new pull request, #6631: Flink: backport PR #6584 to 1.14 and 1.15 for Avro GenericRecord in FLIP-27 source
stevenzwu opened a new pull request, #6631: URL: https://github.com/apache/iceberg/pull/6631 I also piggybacked the fix of package name (a mishap from PR #6584). some classes should be in the `flink/source/reader` packages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6584: Flink: support reading as Avro GenericRecord for FLIP-27 IcebergSource
stevenzwu commented on code in PR #6584: URL: https://github.com/apache/iceberg/pull/6584#discussion_r1082782121 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/reader/AvroGenericRecordReaderFunction.java: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.flink.source.reader; + +import org.apache.avro.generic.GenericRecord; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.configuration.ReadableConfig; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.encryption.EncryptionManager; +import org.apache.iceberg.flink.source.AvroGenericRecordFileScanTaskReader; +import org.apache.iceberg.flink.source.DataIterator; +import org.apache.iceberg.flink.source.RowDataFileScanTaskReader; +import org.apache.iceberg.flink.source.RowDataToAvroGenericRecordConverter; +import org.apache.iceberg.flink.source.split.IcebergSourceSplit; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; + +/** Read Iceberg rows as {@link GenericRecord}. */ +public class AvroGenericRecordReaderFunction extends DataIteratorReaderFunction { + private final String tableName; + private final Schema readSchema; + private final FileIO io; + private final EncryptionManager encryption; + private final RowDataFileScanTaskReader rowDataReader; + + private transient RowDataToAvroGenericRecordConverter converter; + + /** + * Create a reader function without projection and name mapping. Column name is case-insensitive. + */ + public static AvroGenericRecordReaderFunction fromTable(Table table) { +return new AvroGenericRecordReaderFunction( +table.name(), +new Configuration(), +table.schema(), +null, +null, +false, Review Comment: @hililiwei I am merging this PR. please continue to comment on this thread on reader function. I will follow up with a separate PR on the `withOutputType` idea if we agree that is the right direction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] stevenzwu merged pull request #6584: Flink: support reading as Avro GenericRecord for FLIP-27 IcebergSource
stevenzwu merged PR #6584: URL: https://github.com/apache/iceberg/pull/6584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6625: Improve nullability check in Iceberg codebase
jackye1995 commented on issue #6625: URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398630489 > should we maybe raise this discussion topic on the mailing list in order to increase visibility for people? Yes agree, let's do that so we can reach a consensus and proceed to implementation. If there is no response we can raise it during the next sync meeting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on issue #6625: Improve nullability check in Iceberg codebase
nastra commented on issue #6625: URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398623955 I also like annotations like `@Nullable` to indicate that certain things in the API can be nullable as this makes it easier to consume that particular API and reason about it. Maybe we don't need something like `@NonNull` (given that we'd have to agree from which library we'd want to use that annotation) if we could imply that whatever doesn't have `@Nullable` is automatically non-null. An alternative would be the usage of `Optional`, but `@Nullable` would be my preferred option. However, I'd like to hear opinions & thoughts from other people as well on that topic. @jackye1995 should we maybe raise this discussion topic on the mailing list in order to increase visibility for people? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6625: Improve nullability check in Iceberg codebase
jackye1995 commented on issue #6625: URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398577729 @nastra any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec
jackye1995 commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1398572156 I would +1 on storing in snapshot summary, because: 1. snapshot corresponds very well to MV refresh, there is a 1:1 relationship between them. 2. table properties is not versioned as well as snapshot, you cannot access previous able properties of a storage table easily I also agree with the suggestion of Ryan about the information to store, although I think storing the source tables referenced might be a bit hard, it requires statement analysis of the view. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] Fokko merged pull request #6628: Nessie: Bump to 0.47.0
Fokko merged PR #6628: URL: https://github.com/apache/iceberg/pull/6628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] gaborkaszab commented on issue #6257: Partitions metadata table shows old partitions
gaborkaszab commented on issue #6257: URL: https://github.com/apache/iceberg/issues/6257#issuecomment-1398396353 > What would the algorithm be? If the partition has delete files, try to do a full MOR, and check if records are null? Personally, sounds a bit extreme, I would think a good first step is just add a column for delete_files (It may be easier after my new change in #6365). After all, we do have a partition existing, just its of invalid delete files. Interested to hear others thoughts as well. Well, giving this a second (and a third) thought I have to admit that applying delete file on the data files to get the partitions is too heavyweight. I'm wondering if we should document this behaviour somewhere as I remember on Slack there was someone confused about the 'record_count' column of the metadata table not adding up to the same value what count(*) gives. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] mriveraFacephi commented on issue #2040: Partial data ingestion to Iceberg in failing with Spark 3.0.x
mriveraFacephi commented on issue #2040: URL: https://github.com/apache/iceberg/issues/2040#issuecomment-1398240262 Same problem here with Spark 3.1.1 and Iceberg 0.13.1. I'm trying to write dataframe by using the Spark v2 API command writeTo. Every column in my schema is nullable. In my case, I'm using AWS Glue, and the table metadata looks like: `"Columns": [ { "Name": "col", "Type": "string", "Parameters": { "iceberg.field.current": "true", "iceberg.field.id": "1", **"iceberg.field.optional": "true"** } },...` Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] ajantha-bhat commented on pull request #6628: Nessie: Bump to 0.47.0
ajantha-bhat commented on PR #6628: URL: https://github.com/apache/iceberg/pull/6628#issuecomment-1398211777 I think we can bump it to `0.47.1` now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] cgpoh opened a new issue, #6630: Purpose of MAX_CONTINUOUS_EMPTY_COMMITS in IcebergFilesCommitter
cgpoh opened a new issue, #6630: URL: https://github.com/apache/iceberg/issues/6630 ### Query engine Flink ### Question I have a Flink job that uses side output to write to Iceberg table when there are errors in the main processing function. If there are no errors in the processing function, no data files will be added to be committed. I noticed that the Flink job is restarting and throwing the following exception: ``` IcebergFilesCommitter -> Sink: iceberg-error-sink-FPL (1/1)#152 (2b5a4587aa3a50c531671a32a2f1538c) switched from RUNNING to FAILED with failure cause: java.lang.IllegalStateException: Cannot determine partition spec: no data files have been added at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:502) at org.apache.iceberg.MergingSnapshotProducer.dataSpec(MergingSnapshotProducer.java:150) at org.apache.iceberg.BaseReplacePartitions.apply(BaseReplacePartitions.java:133) at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:223) at org.apache.iceberg.BaseReplacePartitions.apply(BaseReplacePartitions.java:26) at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:369) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:402) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:212) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189) at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:367) at org.apache.iceberg.BaseReplacePartitions.commit(BaseReplacePartitions.java:26) at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitOperation(IcebergFilesCommitter.java:372) at org.apache.iceberg.flink.sink.IcebergFilesCommitter.replacePartitions(IcebergFilesCommitter.java:314) at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitPendingResult(IcebergFilesCommitter.java:270) at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitUpToCheckpoint(IcebergFilesCommitter.java:255) at org.apache.iceberg.flink.sink.IcebergFilesCommitter.notifyCheckpointComplete(IcebergFilesCommitter.java:229) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:104) at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointComplete(RegularOperatorChain.java:145) at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:409) at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:343) at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:1384) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$14(StreamTask.java:1325) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$17(StreamTask.java:1364) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.base/java.lang.Thread.run(Unknown Source) ``` I saw that in the `commitPendingResult` function of IcebergFilesCommitter.java, there's a condition to check whether to skip empty commit but if the MAX_CONTINUOUS_EMPTY_COMMITS is met, it will proceed to commit even there are no data files to commit and thus, throwing the above exception. ```java long totalFiles = summary.dataFilesCount() + summary.deleteFilesCount(); continuousEmptyCheckpoints = totalFiles == 0 ? continuousEmptyCheckpoints + 1 : 0; if (totalFiles != 0 || continuousEmptyCheckpoints % maxContinuousEmptyCommits == 0) { if
[GitHub] [iceberg] kingeasternsun commented on pull request #6624: Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.
kingeasternsun commented on PR #6624: URL: https://github.com/apache/iceberg/pull/6624#issuecomment-1398154903 > Left a review, thanks for the contribution @kingeasternsun ! Also looks like spotless checks are failing which you can fix by running `./gradlew :iceberg-api:spotlessJavaCheck` Thanks for your advice, I have fixed them. Will you merge this please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] JanKaul commented on issue #6420: Iceberg Materialized View Spec
JanKaul commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1398113386 Yes, I agree with the proposed design 1. I'm not entirely sure what @rdblue prefers. I will update the Google doc accordingly. The next question for me is where and how to store the refresh information. There are currently two proposed solutions: 1. Store refresh information in snapshot properties of the storage table 2. Store refresh information in storage table properties I will create a section in the Google doc that summarizes these options. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] findepi commented on pull request #6474: Make it explicit that metrics reporter is required
findepi commented on PR #6474: URL: https://github.com/apache/iceberg/pull/6474#issuecomment-1398101458 thanks for the merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] findepi commented on pull request #6474: Make it explicit that metrics reporter is required
findepi commented on PR #6474: URL: https://github.com/apache/iceberg/pull/6474#issuecomment-1398101212 > Yes this is the consequence of different styles of the projects. That's a good point. I accept the inherent friction being result of that, but I do hope some of that friction is avoidable. > I like Trino's approach of checking null at every constructor and also use Optional as much as possible. I like it too. That's why I fully supported @kokosing's push for to leverage Optional whenever possible. There are still some rare places that would benefit from improvement, but we're generally in quite a good shape with null-unfriendliness. > For Iceberg, it would be ideal if nullability is at least indicated in documentation. 100% agreed, especially if you mean API-level documentation (javadoc, annotations), as that's what library users will see in their IDEs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] findepi commented on issue #6625: Improve nullability check in Iceberg codebase
findepi commented on issue #6625: URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398096721 > Also there is little indication in the codebase of which field could potentially be null. This causes a lot of confusions for external engine integrations like Trino. I am happy to contribute annotations for nullable or non-null elements, if that's what we agree to do. This could look e.g. like this https://github.com/apache/iceberg/pull/4978, or maybe someone has better suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624: Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.
kingeasternsun commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1082057002 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java: ## @@ -39,7 +39,8 @@ class MigrateTableProcedure extends BaseProcedure { new ProcedureParameter[] { ProcedureParameter.required("table", DataTypes.StringType), ProcedureParameter.optional("properties", STRING_MAP), -ProcedureParameter.optional("drop_backup", DataTypes.BooleanType) +ProcedureParameter.optional("drop_backup", DataTypes.BooleanType), +ProcedureParameter.optional("max_concurrent_read_datafiles", DataTypes.IntegerType) Review Comment: Thanks for your reviews, It because this https://github.com/apache/iceberg/pull/3973#discussion_r794150006 @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6629: Build: Fix minor error-prone warnings
ajantha-bhat commented on code in PR #6629: URL: https://github.com/apache/iceberg/pull/6629#discussion_r1082180504 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -329,14 +329,14 @@ static Long expiresAtMillis(String token) { return null; } -String[] parts = token.split("\\."); -if (parts.length != 3) { +List parts = Splitter.on('.').splitToList(token); Review Comment: As it is not used in multiple places (in the same file), I thought no need for an extra variable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6629: Build: Fix minor error-prone warnings
nastra commented on code in PR #6629: URL: https://github.com/apache/iceberg/pull/6629#discussion_r1082177976 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -329,14 +329,14 @@ static Long expiresAtMillis(String token) { return null; } -String[] parts = token.split("\\."); -if (parts.length != 3) { +List parts = Splitter.on('.').splitToList(token); Review Comment: nit: should we extract this to a `private static final Splitter DOT_SPLITTER`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases
nastra commented on code in PR #6586: URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082174852 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java: ## @@ -132,6 +133,29 @@ public void testCreateTableBadName() { TableIdentifier.of(namespace, "table-1"), schema, partitionSpec)); } + @Test + public void testCreateAndLoadTableWithoutWarehouseLocation() { +GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog(); +glueCatalogWithoutWarehouse.initialize( +catalogName, +null, +new AwsProperties(), +glue, +LockManagers.defaultLockManager(), +new S3FileIO(clientFactory::s3), +ImmutableMap.of()); +String namespace = createNamespace(); +String tableName = getRandomName(); +TableIdentifier identifier = TableIdentifier.of(namespace, tableName); +try { Review Comment: thanks, that makes sense in this case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on pull request #6626: Core: Update error msg
nastra commented on PR #6626: URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1398007171 > > not sure, usually it's been called out on my own PRs to adjust the error msg to that particular format (hence the reason I mentioned it on the other PR), which is being used across other places in Iceberg > > In that case, would it be better for us to change all the related null checks to checkArgument, and update the checkstyle rule for it to avoid use of checkNotNull? I think we could take this opportunity to set some standards. I think it's definitely worth discussing this so that people know what the accepted approach is and enforce it via checkstyle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] hililiwei commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
hililiwei commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082173399 ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: ParserInterface) extends IcebergS private def typedVisit[T](ctx: ParseTree): T = { ctx.accept(this).asInstanceOf[T] } + + private def timeRetain(unit: String, duration: Long): Long = unit.toUpperCase(Locale.ENGLISH) match { +case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration) Review Comment: Yes, it does. Updated. ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: ParserInterface) extends IcebergS private def typedVisit[T](ctx: ParseTree): T = { ctx.accept(this).asInstanceOf[T] } + + private def timeRetain(unit: String, duration: Long): Long = unit.toUpperCase(Locale.ENGLISH) match { +case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration) +case "DAYS" | "HOURS" | "MINUTES" => TimeUnit.valueOf(unit.toUpperCase(Locale.ENGLISH)).toMillis(duration) Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6598: Core: View representation core implementation
nastra commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1082170758 ## core/src/main/java/org/apache/iceberg/view/SQLViewRepresentationParser.java: ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.view; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.databind.JsonNode; +import java.io.IOException; +import java.util.Arrays; +import java.util.List; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.util.JsonUtil; + +class SQLViewRepresentationParser { + private static final String SQL = "sql"; + private static final String DIALECT = "dialect"; + private static final String SCHEMA_ID = "schema-id"; + private static final String DEFAULT_CATALOG = "default-catalog"; + private static final String DEFAULT_NAMESPACE = "default-namespace"; + private static final String FIELD_ALIASES = "field-aliases"; + private static final String FIELD_COMMENTS = "field-comments"; + + private SQLViewRepresentationParser() {} + + static String toJson(SQLViewRepresentation sqlViewRepresentation) { +return JsonUtil.generate(gen -> toJson(sqlViewRepresentation, gen), false); + } + + static void toJson(SQLViewRepresentation view, JsonGenerator generator) throws IOException { +Preconditions.checkArgument(view != null, "Invalid SQL view representation: null"); +generator.writeStartObject(); +generator.writeStringField(ViewRepresentationParser.TYPE, view.type()); +generator.writeStringField(SQL, view.sql()); +generator.writeStringField(DIALECT, view.dialect()); + +if (view.schemaId() != null) { + generator.writeNumberField(SCHEMA_ID, view.schemaId()); +} + +if (view.defaultCatalog() != null) { + generator.writeStringField(DEFAULT_CATALOG, view.defaultCatalog()); +} + +if (view.defaultNamespace() != null) { + JsonUtil.writeStringArray( + DEFAULT_NAMESPACE, Arrays.asList(view.defaultNamespace().levels()), generator); +} + +if (view.fieldAliases() != null && !view.fieldAliases().isEmpty()) { + JsonUtil.writeStringArray( + SQLViewRepresentationParser.FIELD_ALIASES, view.fieldAliases(), generator); +} + +if (view.fieldComments() != null && !view.fieldComments().isEmpty()) { + JsonUtil.writeStringArray( + SQLViewRepresentationParser.FIELD_COMMENTS, view.fieldComments(), generator); +} + +generator.writeEndObject(); + } + + static SQLViewRepresentation fromJson(String json) { +Preconditions.checkArgument( +json != null, "Cannot parse SQL view representation from null JSON"); +return JsonUtil.parse(json, SQLViewRepresentationParser::fromJson); + } + + static SQLViewRepresentation fromJson(JsonNode node) { +Preconditions.checkArgument( Review Comment: nit: it would probably be good to also add Preconditions.checkArgument(node.isObject(), "Cannot parse SQL view representation from non-object: %s", node);`, similary to how you have it in the other parser ## core/src/main/java/org/apache/iceberg/view/ViewRepresentationParser.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.view; +
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases
jackye1995 commented on code in PR #6586: URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082172027 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java: ## @@ -132,6 +133,29 @@ public void testCreateTableBadName() { TableIdentifier.of(namespace, "table-1"), schema, partitionSpec)); } + @Test + public void testCreateAndLoadTableWithoutWarehouseLocation() { +GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog(); +glueCatalogWithoutWarehouse.initialize( +catalogName, +null, +new AwsProperties(), +glue, +LockManagers.defaultLockManager(), +new S3FileIO(clientFactory::s3), +ImmutableMap.of()); +String namespace = createNamespace(); +String tableName = getRandomName(); +TableIdentifier identifier = TableIdentifier.of(namespace, tableName); +try { Review Comment: yes, just try to make the failure reason more explicit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases
nastra commented on code in PR #6586: URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082168946 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java: ## @@ -132,6 +133,29 @@ public void testCreateTableBadName() { TableIdentifier.of(namespace, "table-1"), schema, partitionSpec)); } + @Test + public void testCreateAndLoadTableWithoutWarehouseLocation() { +GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog(); +glueCatalogWithoutWarehouse.initialize( +catalogName, +null, +new AwsProperties(), +glue, +LockManagers.defaultLockManager(), +new S3FileIO(clientFactory::s3), +ImmutableMap.of()); +String namespace = createNamespace(); +String tableName = getRandomName(); +TableIdentifier identifier = TableIdentifier.of(namespace, tableName); +try { Review Comment: nit: is the try-catch really needed when we expect the test to succeed? Or is it more of a "give an additional hint when it fails"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec
jackye1995 commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397987523 @JanKaul if you agree with the summarized consensus we have mostly reached there, for the sake of moving the progress of the discussion forward, could you update the Google doc with the design, and describe the spec changes based on the discussions and suggestions here and combine with your original idea? So that we can start to review the spec contents -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases
jackye1995 commented on code in PR #6586: URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082152137 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java: ## @@ -132,6 +134,28 @@ public void testCreateTableBadName() { TableIdentifier.of(namespace, "table-1"), schema, partitionSpec)); } + @Test + public void testCreateAndLoadTableWithoutWarehouseLocation() { +GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog(); +glueCatalogWithoutWarehouse.initialize( +catalogName, +null, +new AwsProperties(), +glue, +LockManagers.defaultLockManager(), +new S3FileIO(clientFactory::s3), +ImmutableMap.of()); +String namespace = createNamespace(); +String tableName = getRandomName(); +TableIdentifier identifier = TableIdentifier.of(namespace, tableName); +try { + glueCatalog.createTable(identifier, schema, partitionSpec, tableLocationProperties); + glueCatalog.loadTable(identifier); +} catch (RuntimeException e) { + fail("Create and load table without warehouse location should succeed"); Review Comment: yes, I was not able to find a `fail(...)` method that allows taking exception, thus this. Throwing IO exception seems too much, I changed the code to rethrow the runtime exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases
jackye1995 commented on code in PR #6586: URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082151825 ## aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java: ## @@ -177,13 +178,9 @@ void initialize( GlueClient client, LockManager lock, FileIO io) { -Preconditions.checkArgument( -path != null && path.length() > 0, -"Cannot initialize GlueCatalog because warehousePath must not be null or empty"); - this.catalogName = name; this.awsProperties = properties; -this.warehousePath = LocationUtil.stripTrailingSlash(path); +this.warehousePath = path != null ? LocationUtil.stripTrailingSlash(path) : null; Review Comment: good point, updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] aajisaka commented on a diff in pull request #6358: AWS: Print logs whether Glue optimistic locking is used or not
aajisaka commented on code in PR #6358: URL: https://github.com/apache/iceberg/pull/6358#discussion_r1082137636 ## aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java: ## @@ -151,7 +151,12 @@ private LockManager initializeLockManager(Map properties) { if (properties.containsKey(CatalogProperties.LOCK_IMPL)) { return LockManagers.from(properties); } else if (SET_VERSION_ID.isNoop()) { + LOG.warn( + "Optimistic locking is not available in the environment. Using in-memory lock manager." + + " To ensure atomic transaction, you need to setup a DynamoDB lock manager."); Review Comment: Fixed the warn message as you suggested. Thank you @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] JanKaul commented on issue #6420: Iceberg Materialized View Spec
JanKaul commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397956873 > I don't know if it would work or too crazy, just to throw the idea out that I just came up with: > > We could potentially make MV a representation in view spec, in parallel to the SQL representation. So for a materialized view, it could have 2 representations: > > ``` > [ > { > "type" : "sql", > "sql" : "SELECT \"count\"(*) my_cnt\nFROM\n base_tab\n", > "dialect" : "spark", > "schema-id" : 2, > "default-catalog" : "iceberg", > "default-namespace" : [ "anorwood" ] > }, > { > "type" : "materialized", > "namespace": "some_namespace" > "table" : "some_storage_table" > } > ] > ``` > > By doing so, you could support a case where 1 view does not really have a SQL representation, but just a few different table layouts: > > ``` > [ > { > "type" : "materialized", > "namespace": "some_namespace" > "table" : "some_storage_table_partitioned_by_col1" >}, > { > "type" : "materialized", > "namespace": "some_namespace" > "table" : "some_storage_table_partitioned_by_col2" > } > ] > ``` > > which satisfies the use case described by Walaa. > > > > > Great idea! I think this is a very clean solution to store the metadata. However I would rename the type to `sql_materialized` in case there are other representations in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] aajisaka commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases
aajisaka commented on code in PR #6586: URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082123777 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java: ## @@ -132,6 +134,28 @@ public void testCreateTableBadName() { TableIdentifier.of(namespace, "table-1"), schema, partitionSpec)); } + @Test + public void testCreateAndLoadTableWithoutWarehouseLocation() { +GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog(); +glueCatalogWithoutWarehouse.initialize( +catalogName, +null, +new AwsProperties(), +glue, +LockManagers.defaultLockManager(), +new S3FileIO(clientFactory::s3), +ImmutableMap.of()); +String namespace = createNamespace(); +String tableName = getRandomName(); +TableIdentifier identifier = TableIdentifier.of(namespace, tableName); +try { + glueCatalog.createTable(identifier, schema, partitionSpec, tableLocationProperties); + glueCatalog.loadTable(identifier); +} catch (RuntimeException e) { + fail("Create and load table without warehouse location should succeed"); Review Comment: ```suggestion throw new IOException("Create and load table without warehouse location should succeed", e); ``` We shouldn't hide the original exception. Without the exception message and the stacktrace, it's hard to find the root cause when an exception occur. ## aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java: ## @@ -177,13 +178,9 @@ void initialize( GlueClient client, LockManager lock, FileIO io) { -Preconditions.checkArgument( -path != null && path.length() > 0, -"Cannot initialize GlueCatalog because warehousePath must not be null or empty"); - this.catalogName = name; this.awsProperties = properties; -this.warehousePath = LocationUtil.stripTrailingSlash(path); +this.warehousePath = path != null ? LocationUtil.stripTrailingSlash(path) : null; Review Comment: If the `path` is empty, it fails in `LocationUtil.stripTrailiingSlash(path)` ```suggestion this.warehousePath = (path != null && path.length() > 0) ? LocationUtil.stripTrailingSlash(path) : null; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082118919 ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: ParserInterface) extends IcebergS private def typedVisit[T](ctx: ParseTree): T = { ctx.accept(this).asInstanceOf[T] } + + private def timeRetain(unit: String, duration: Long): Long = unit.toUpperCase(Locale.ENGLISH) match { +case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration) +case "DAYS" | "HOURS" | "MINUTES" => TimeUnit.valueOf(unit.toUpperCase(Locale.ENGLISH)).toMillis(duration) Review Comment: do we still need to use match statement? The parser will only allow these values, so we should be able to directly parse it to TimeUnit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082118486 ## spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4: ## @@ -168,34 +175,77 @@ fieldList ; nonReserved -: ADD | ALTER | AS | ASC | BY | CALL | DESC | DROP | FIELD | FIRST | LAST | NULLS | ORDERED | PARTITION | TABLE | WRITE -| DISTRIBUTED | LOCALLY | UNORDERED | REPLACE | WITH | IDENTIFIER_KW | FIELDS | SET +: ADD | ALTER | AS | ASC | BRANCH | BY | CALL | CREATE | DAYS | DESC | DROP | FIELD | FIRST | HOURS | LAST | NULLS | OF | ORDERED | PARTITION | TABLE | WRITE +| DISTRIBUTED | LOCALLY | MINUTES | MONTHS | UNORDERED | REPLACE | RETAIN | VERSION | WITH | IDENTIFIER_KW | FIELDS | SET | SNAPSHOT | SNAPSHOTS | TRUE | FALSE | MAP ; +snapshotId +: number +; + +numSnapshots +: number +; + +snapshotRetain +: number +; + +snapshotRefRetain +: number +; + +snapshotRefRetainTimeUnit +: timeUnit +; + +snapshotRetainTimeUnit +: timeUnit +; + +timeUnit +: MONTHS +| DAYS +| HOURS +| MINUTES Review Comment: sure, I am fine with minute level. We can always add more if needed. ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: ParserInterface) extends IcebergS private def typedVisit[T](ctx: ParseTree): T = { ctx.accept(this).asInstanceOf[T] } + + private def timeRetain(unit: String, duration: Long): Long = unit.toUpperCase(Locale.ENGLISH) match { +case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration) Review Comment: hmm, good point, there is not exactly always 30 days in a month. That's probably why it is not supported by `TimeUnit`. So I would say in that case let's remove MONTHS from the time unit in parser and just support DAYS, HOURS and MINUTES. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082118109 ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: ParserInterface) extends IcebergS private def typedVisit[T](ctx: ParseTree): T = { ctx.accept(this).asInstanceOf[T] } + + private def timeRetain(unit: String, duration: Long): Long = unit.toUpperCase(Locale.ENGLISH) match { +case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration) Review Comment: hmm, good point, there is not exactly always 30 days in a month. That's probably why it is not supported by `TimeUnit`. So I would say in that case let's remove MONTHS from the time unit in parser and just support DAYS, HOURS, MINUTES and SECONDS -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082116187 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.net.URI; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopFileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.MappingUtil; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to create an Iceberg table snapshot in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class); + + private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final ImmutableMap.Builder additionalPropertiesBuilder = + ImmutableMap.builder(); + private DeltaLog deltaLog; + private Catalog icebergCatalog; + private final String deltaTableLocation; + private TableIdentifier newTableIdentifier; + private String newTableLocation; + private HadoopFileIO deltaLakeFileIO; + + /** + * Snapshot a delta lake table to be an iceberg table. The action will read the delta lake table's + * log through the table's path, create a new iceberg table using the given icebergCatalog and + * newTableIdentifier, and commit all changes in one iceberg transaction. + * + * The new table will only be created if the snapshot is successful. + * + * @param deltaTableLocation the delta lake table's path + */ + BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) { +this.deltaTableLocation = deltaTableLocation; +this.newTableLocation = deltaTableLocation; + } + + @Override + public SnapshotDeltaLakeTable tableProperties(Map properties) { +additionalPropertiesBuilder.putAll(properties); +return this; + } + + @Override + public SnapshotDeltaLakeTable tableProperty(String
[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097202 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.net.URI; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopFileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.MappingUtil; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to create an Iceberg table snapshot in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class); + + private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final ImmutableMap.Builder additionalPropertiesBuilder = + ImmutableMap.builder(); + private DeltaLog deltaLog; + private Catalog icebergCatalog; + private final String deltaTableLocation; + private TableIdentifier newTableIdentifier; + private String newTableLocation; + private HadoopFileIO deltaLakeFileIO; + + /** + * Snapshot a delta lake table to be an iceberg table. The action will read the delta lake table's + * log through the table's path, create a new iceberg table using the given icebergCatalog and + * newTableIdentifier, and commit all changes in one iceberg transaction. + * + * The new table will only be created if the snapshot is successful. + * + * @param deltaTableLocation the delta lake table's path + */ + BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) { +this.deltaTableLocation = deltaTableLocation; +this.newTableLocation = deltaTableLocation; + } + + @Override + public SnapshotDeltaLakeTable tableProperties(Map properties) { +additionalPropertiesBuilder.putAll(properties); +return this; + } + + @Override + public SnapshotDeltaLakeTable tableProperty(String
[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097202 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.net.URI; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopFileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.MappingUtil; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to create an Iceberg table snapshot in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class); + + private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final ImmutableMap.Builder additionalPropertiesBuilder = + ImmutableMap.builder(); + private DeltaLog deltaLog; + private Catalog icebergCatalog; + private final String deltaTableLocation; + private TableIdentifier newTableIdentifier; + private String newTableLocation; + private HadoopFileIO deltaLakeFileIO; + + /** + * Snapshot a delta lake table to be an iceberg table. The action will read the delta lake table's + * log through the table's path, create a new iceberg table using the given icebergCatalog and + * newTableIdentifier, and commit all changes in one iceberg transaction. + * + * The new table will only be created if the snapshot is successful. + * + * @param deltaTableLocation the delta lake table's path + */ + BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) { +this.deltaTableLocation = deltaTableLocation; +this.newTableLocation = deltaTableLocation; + } + + @Override + public SnapshotDeltaLakeTable tableProperties(Map properties) { +additionalPropertiesBuilder.putAll(properties); +return this; + } + + @Override + public SnapshotDeltaLakeTable tableProperty(String
[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082104013 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.net.URI; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopFileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.MappingUtil; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to create an Iceberg table snapshot in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class); + + private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final ImmutableMap.Builder additionalPropertiesBuilder = + ImmutableMap.builder(); + private DeltaLog deltaLog; + private Catalog icebergCatalog; + private final String deltaTableLocation; + private TableIdentifier newTableIdentifier; + private String newTableLocation; + private HadoopFileIO deltaLakeFileIO; + + /** + * Snapshot a delta lake table to be an iceberg table. The action will read the delta lake table's + * log through the table's path, create a new iceberg table using the given icebergCatalog and + * newTableIdentifier, and commit all changes in one iceberg transaction. + * + * The new table will only be created if the snapshot is successful. + * + * @param deltaTableLocation the delta lake table's path + */ + BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) { +this.deltaTableLocation = deltaTableLocation; +this.newTableLocation = deltaTableLocation; + } + + @Override + public SnapshotDeltaLakeTable tableProperties(Map properties) { +additionalPropertiesBuilder.putAll(properties); +return this; + } + + @Override + public SnapshotDeltaLakeTable tableProperty(String
[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082104013 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.net.URI; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopFileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.MappingUtil; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to create an Iceberg table snapshot in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class); + + private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final ImmutableMap.Builder additionalPropertiesBuilder = + ImmutableMap.builder(); + private DeltaLog deltaLog; + private Catalog icebergCatalog; + private final String deltaTableLocation; + private TableIdentifier newTableIdentifier; + private String newTableLocation; + private HadoopFileIO deltaLakeFileIO; + + /** + * Snapshot a delta lake table to be an iceberg table. The action will read the delta lake table's + * log through the table's path, create a new iceberg table using the given icebergCatalog and + * newTableIdentifier, and commit all changes in one iceberg transaction. + * + * The new table will only be created if the snapshot is successful. + * + * @param deltaTableLocation the delta lake table's path + */ + BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) { +this.deltaTableLocation = deltaTableLocation; +this.newTableLocation = deltaTableLocation; + } + + @Override + public SnapshotDeltaLakeTable tableProperties(Map properties) { +additionalPropertiesBuilder.putAll(properties); +return this; + } + + @Override + public SnapshotDeltaLakeTable tableProperty(String
[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097985 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.net.URI; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopFileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.MappingUtil; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to create an Iceberg table snapshot in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class); + + private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final ImmutableMap.Builder additionalPropertiesBuilder = + ImmutableMap.builder(); + private DeltaLog deltaLog; + private Catalog icebergCatalog; + private final String deltaTableLocation; + private TableIdentifier newTableIdentifier; + private String newTableLocation; + private HadoopFileIO deltaLakeFileIO; + + /** + * Snapshot a delta lake table to be an iceberg table. The action will read the delta lake table's + * log through the table's path, create a new iceberg table using the given icebergCatalog and + * newTableIdentifier, and commit all changes in one iceberg transaction. + * + * The new table will only be created if the snapshot is successful. + * + * @param deltaTableLocation the delta lake table's path + */ + BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) { +this.deltaTableLocation = deltaTableLocation; +this.newTableLocation = deltaTableLocation; + } + + @Override + public SnapshotDeltaLakeTable tableProperties(Map properties) { +additionalPropertiesBuilder.putAll(properties); +return this; + } + + @Override + public SnapshotDeltaLakeTable tableProperty(String
[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table
JonasJ-ap commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097202 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java: ## @@ -0,0 +1,370 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.net.URI; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopFileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.MappingUtil; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to create an Iceberg table snapshot in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class); + + private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final ImmutableMap.Builder additionalPropertiesBuilder = + ImmutableMap.builder(); + private DeltaLog deltaLog; + private Catalog icebergCatalog; + private final String deltaTableLocation; + private TableIdentifier newTableIdentifier; + private String newTableLocation; + private HadoopFileIO deltaLakeFileIO; + + /** + * Snapshot a delta lake table to be an iceberg table. The action will read the delta lake table's + * log through the table's path, create a new iceberg table using the given icebergCatalog and + * newTableIdentifier, and commit all changes in one iceberg transaction. + * + * The new table will only be created if the snapshot is successful. + * + * @param deltaTableLocation the delta lake table's path + */ + BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) { +this.deltaTableLocation = deltaTableLocation; +this.newTableLocation = deltaTableLocation; + } + + @Override + public SnapshotDeltaLakeTable tableProperties(Map properties) { +additionalPropertiesBuilder.putAll(properties); +return this; + } + + @Override + public SnapshotDeltaLakeTable tableProperty(String
[GitHub] [iceberg] amogh-jahagirdar commented on issue #6619: Disaster Recovery Options for AWS Athena/Iceberg Integration
amogh-jahagirdar commented on issue #6619: URL: https://github.com/apache/iceberg/issues/6619#issuecomment-139751 Thanks for creating this issue, @anthonysgro could you provide more details on how you're recreating the table and pointing the location and how AWS Backup fits in? As far as my understanding goes, for disaster recovery for Iceberg tables generally it's recommended to use S3 multi-region access points . You can check out https://github.com/apache/iceberg/issues/5779 this discussion for more details. Some docs: https://aws.amazon.com/s3/features/multi-region-access-points/ https://iceberg.apache.org/docs/latest/aws/#s3-access-points If you desire S3 access point integration in Athena specifically, feel free to ping me on Slack! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes
amogh-jahagirdar commented on PR #6588: URL: https://github.com/apache/iceberg/pull/6588#issuecomment-1397883260 Thanks for clarifying @RussellSpitzer I think it makes a ton of sense to leave the specifics of bulk vs parallel to the FileIO abstraction. In this case, we leverage bulk delete wherever possible (S3) and then do parallel deletions for file systems which don't support bulk ops like HDFS to improve the throughput of deletions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] ajantha-bhat commented on pull request #6629: Build: Fix minor error-prone warnings
ajantha-bhat commented on PR #6629: URL: https://github.com/apache/iceberg/pull/6629#issuecomment-1397859049 cc: @nastra, @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6629: Build: Fix minor error-prone warnings
ajantha-bhat opened a new pull request, #6629: URL: https://github.com/apache/iceberg/pull/6629 I have observed that the build [`./gradlew clean build -x test`] has some warnings. So it is an effort to keep the build green. Before: https://user-images.githubusercontent.com/5889404/213606755-f8a491a3-99e4-4e6f-ac57-c6b5437e160c.png;> After: https://user-images.githubusercontent.com/5889404/213606771-87eed4bb-abf9-42d4-887a-f24e1815f3b2.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624: Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.
kingeasternsun commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1082061029 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/SnapshotTableProcedure.java: ## @@ -93,10 +94,20 @@ public InternalRow[] call(InternalRow args) { }); } +int parallelism; +if (!args.isNullAt(4)) { + parallelism = args.getInt(4); +} else { + parallelism = 1; +} Review Comment: Thanks for your advice, I will fix this. ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java: ## @@ -99,7 +100,13 @@ public InternalRow[] call(InternalRow args) { if (dropBackup) { result = migrateTableSparkAction.dropBackup().execute(); } else { - result = migrateTableSparkAction.execute(); + int parallelism; + if (!args.isNullAt(3)) { +parallelism = args.getInt(3); + } else { +parallelism = 1; + } Review Comment: Thanks for your advice, I will fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624: Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.
kingeasternsun commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1082057002 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java: ## @@ -39,7 +39,8 @@ class MigrateTableProcedure extends BaseProcedure { new ProcedureParameter[] { ProcedureParameter.required("table", DataTypes.StringType), ProcedureParameter.optional("properties", STRING_MAP), -ProcedureParameter.optional("drop_backup", DataTypes.BooleanType) +ProcedureParameter.optional("drop_backup", DataTypes.BooleanType), +ProcedureParameter.optional("max_concurrent_read_datafiles", DataTypes.IntegerType) Review Comment: Thanks for your reviews, It because this https://github.com/apache/iceberg/pull/3973#discussion_r794150006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] hililiwei commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
hililiwei commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082056581 ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -267,6 +285,16 @@ class IcebergSqlExtensionsAstBuilder(delegate: ParserInterface) extends IcebergS private def typedVisit[T](ctx: ParseTree): T = { ctx.accept(this).asInstanceOf[T] } + + private val timeUnit = (unit: String) => { +unit.toUpperCase(Locale.ENGLISH) match { + case "MONTHS" => 30 * 24 * 60 * 60 * 1000L Review Comment: Modified. PTAL. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] cgpoh closed issue #6606: MinIO com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
cgpoh closed issue #6606: MinIO com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool URL: https://github.com/apache/iceberg/issues/6606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] cgpoh commented on issue #6606: MinIO com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
cgpoh commented on issue #6606: URL: https://github.com/apache/iceberg/issues/6606#issuecomment-1397847153 After looking into the code, realised that instead of having s3.connection.maximum in flink configuration, I should set the values in Hadoop configuration and pass in the configuration to HiveCatalog instead. ```kotlin val conf = Configuration() conf["fs.s3a.connection.maximum"] = 100 val catalogLoader = CatalogLoader.hive(hiveCatalogName, conf, properties) ``` with this, I can see the maximum connection is set to 100 in the log. I will close this issue for now as my job is running for 24hrs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] hililiwei commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
hililiwei commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082045421 ## spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4: ## @@ -168,34 +175,77 @@ fieldList ; nonReserved -: ADD | ALTER | AS | ASC | BY | CALL | DESC | DROP | FIELD | FIRST | LAST | NULLS | ORDERED | PARTITION | TABLE | WRITE -| DISTRIBUTED | LOCALLY | UNORDERED | REPLACE | WITH | IDENTIFIER_KW | FIELDS | SET +: ADD | ALTER | AS | ASC | BRANCH | BY | CALL | CREATE | DAYS | DESC | DROP | FIELD | FIRST | HOURS | LAST | NULLS | OF | ORDERED | PARTITION | TABLE | WRITE +| DISTRIBUTED | LOCALLY | MINUTES | MONTHS | UNORDERED | REPLACE | RETAIN | VERSION | WITH | IDENTIFIER_KW | FIELDS | SET | SNAPSHOT | SNAPSHOTS | TRUE | FALSE | MAP ; +snapshotId +: number +; + +numSnapshots +: number +; + +snapshotRetain +: number +; + +snapshotRefRetain +: number +; + +snapshotRefRetainTimeUnit +: timeUnit +; + +snapshotRetainTimeUnit +: timeUnit +; + +timeUnit +: MONTHS +| DAYS +| HOURS +| MINUTES Review Comment: Should we support it? I prefer at least the minute-level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1082037773 ## core/src/main/java/org/apache/iceberg/PositionDeletesTable.java: ## @@ -0,0 +1,396 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg; + +import static org.apache.iceberg.MetadataColumns.POSITION_DELETE_TABLE_FILE_PATH_ID; +import static org.apache.iceberg.MetadataColumns.POSITION_DELETE_TABLE_PARTITION_FIELD_ID; +import static org.apache.iceberg.MetadataColumns.POSITION_DELETE_TABLE_SPEC_ID; + +import com.github.benmanes.caffeine.cache.Caffeine; +import com.github.benmanes.caffeine.cache.LoadingCache; +import java.util.List; +import java.util.Map; +import java.util.function.BiFunction; +import java.util.stream.Collectors; +import org.apache.iceberg.expressions.Expression; +import org.apache.iceberg.expressions.Expressions; +import org.apache.iceberg.expressions.ResidualEvaluator; +import org.apache.iceberg.io.CloseableIterable; +import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.relocated.com.google.common.collect.Sets; +import org.apache.iceberg.types.Type; +import org.apache.iceberg.types.TypeUtil; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.ParallelIterable; +import org.apache.iceberg.util.PartitionUtil; +import org.apache.iceberg.util.TableScanUtil; + +public class PositionDeletesTable extends BaseMetadataTable { + + private final Schema schema; + + PositionDeletesTable(TableOperations ops, Table table) { +super(ops, table, table.name() + ".position_deletes"); +this.schema = calculateSchema(); + } + + PositionDeletesTable(TableOperations ops, Table table, String name) { +super(ops, table, name); +this.schema = calculateSchema(); + } + + @Override + MetadataTableType metadataTableType() { +return MetadataTableType.POSITION_DELETES; + } + + @Override + public TableScan newScan() { +throw new UnsupportedOperationException( +"Cannot create TableScan from table of type POSITION_DELETES"); + } + + @Override + public BatchScan newBatchScan() { +return new PositionDeletesBatchScan(operations(), table(), schema()); + } + + @Override + public Schema schema() { +return schema; + } + + private Schema calculateSchema() { +Types.StructType partitionType = Partitioning.partitionType(table()); +Schema result = +new Schema( +MetadataColumns.DELETE_FILE_PATH, +MetadataColumns.DELETE_FILE_POS, +Types.NestedField.optional( +MetadataColumns.DELETE_FILE_ROW_FIELD_ID, +"row", +table().schema().asStruct(), +MetadataColumns.DELETE_FILE_ROW_DOC), +Types.NestedField.required( +POSITION_DELETE_TABLE_PARTITION_FIELD_ID, +"partition", +partitionType, +"Partition that position delete row belongs to"), +Types.NestedField.required( +POSITION_DELETE_TABLE_SPEC_ID, +"spec_id", +Types.IntegerType.get(), +"Spec ID of the file that the position delete row belongs to"), +Types.NestedField.required( +POSITION_DELETE_TABLE_FILE_PATH_ID, +"delete_file_path", +Types.StringType.get(), +"Path of the delete file that the position delete row belongs to")); + +if (partitionType.fields().size() > 0) { + return result; +} else { + // avoid returning an empty struct, which is not always supported. + // instead, drop the partition field + return TypeUtil.selectNot(result, Sets.newHashSet(POSITION_DELETE_TABLE_PARTITION_FIELD_ID)); +} + } + + public static class PositionDeletesBatchScan + extends SnapshotScan> implements BatchScan { + +protected PositionDeletesBatchScan(TableOperations ops, Table table, Schema schema) { + super(ops, table, schema, new TableScanContext()); +} + +protected PositionDeletesBatchScan( +
[GitHub] [iceberg] dmgcodevil commented on issue #6587: Wrong class, java.lang.Long, for object: 19367
dmgcodevil commented on issue #6587: URL: https://github.com/apache/iceberg/issues/6587#issuecomment-1397819184 Delete orphan files action is also affected after the schema change: ``` java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Void at org.apache.iceberg.transforms.VoidTransform.toHumanString(VoidTransform.java:27) at org.apache.iceberg.ManifestsTable.partitionSummariesToRows(ManifestsTable.java:131) at org.apache.iceberg.ManifestsTable.manifestFileToRow(ManifestsTable.java:115) at org.apache.iceberg.AllManifestsTable$ManifestListReadTask.lambda$rows$0(AllManifestsTable.java:194) at org.apache.iceberg.io.CloseableIterable$4$1.next(CloseableIterable.java:113) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags
ajantha-bhat commented on code in PR #6627: URL: https://github.com/apache/iceberg/pull/6627#discussion_r1082026758 ## docs/spark-queries.md: ## @@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak SQL -Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses +Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses. +The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name. + +{{< hint info >}} +Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot +with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot with ID 2. +If the version travel clause is `VERSION AS OF '1'`, time travel will be done to the snapshot with ID 1. +If this is not desired, rename the tag or branch with a well-defined prefix such as 'snapshot-1'. +{{< /hint >}} ```sql -- time travel to October 26, 1986 at 01:21:00 SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00'; -- time travel to snapshot with id 10963874102873L SELECT * FROM prod.db.table VERSION AS OF 10963874102873; + +-- time travel to the head snapshot of audit-branch +SELECT * FROM prod.db.table VERSION AS OF 'audit-branch'; + +-- time travel to the snapshot referenced by the tag historical-snapshot +SELECT * FROM prod.db.table VERSION AS OF 'historical-snapshot'; ``` In addition, `FOR SYSTEM_TIME AS OF` and `FOR SYSTEM_VERSION AS OF` clauses are also supported: ```sql SELECT * FROM prod.db.table FOR SYSTEM_TIME AS OF '1986-10-26 01:21:00'; SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 10963874102873; +SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'audit-branch'; +SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'historical-snapshot'; Review Comment: lines 145 and 146 are still ambiguous to me. > * `branch` selects the head snapshot of the specified branch. Note that currently branch cannot be combined with as-of-timestamp. * `tag` selects the snapshot associated with the specified tag Does that mean tags support `as-of-timestamp`? No right? Maybe we need to update it for tags too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on PR #6617: URL: https://github.com/apache/iceberg/pull/6617#issuecomment-1397789606 Ping some people for thoughts around the syntax: @rdblue @RussellSpitzer @nastra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082011726 ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateBranchExec.scala: ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.iceberg.spark.source.SparkTable +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.plans.logical.CreateBranch +import org.apache.spark.sql.connector.catalog.Identifier +import org.apache.spark.sql.connector.catalog.TableCatalog + +case class CreateBranchExec( + catalog: TableCatalog, + ident: Identifier, + createBranch: CreateBranch) extends LeafV2CommandExec { + + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ + + override lazy val output: Seq[Attribute] = Nil + + override protected def run(): Seq[InternalRow] = { +catalog.loadTable(ident) match { + case iceberg: SparkTable => + +val snapshotId = createBranch.snapshotId.getOrElse(iceberg.table.currentSnapshot().snapshotId()) +iceberg.table.manageSnapshots() + .createBranch(createBranch.branch, snapshotId) + .setMinSnapshotsToKeep(createBranch.branch, createBranch.numSnapshots.getOrElse(1L).toInt) + // 5 days + .setMaxSnapshotAgeMs(createBranch.branch, createBranch.snapshotRetain.getOrElse(5 * 24 * 60 * 60 * 1000L)) + .setMaxRefAgeMs(createBranch.branch, createBranch.snapshotRefRetain.getOrElse(Long.MaxValue)) + .commit() + + case table => +throw new UnsupportedOperationException(s"Cannot add branch to non-Iceberg table: $table") Review Comment: This seems like an existing pattern for all extensions, so I think it is probably fine to leave it like this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082011227 ## spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -267,6 +285,16 @@ class IcebergSqlExtensionsAstBuilder(delegate: ParserInterface) extends IcebergS private def typedVisit[T](ctx: ParseTree): T = { ctx.accept(this).asInstanceOf[T] } + + private val timeUnit = (unit: String) => { +unit.toUpperCase(Locale.ENGLISH) match { + case "MONTHS" => 30 * 24 * 60 * 60 * 1000L Review Comment: can use `TimeUnit.MONTHS.toMillis(...)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082010680 ## spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4: ## @@ -168,34 +175,77 @@ fieldList ; nonReserved -: ADD | ALTER | AS | ASC | BY | CALL | DESC | DROP | FIELD | FIRST | LAST | NULLS | ORDERED | PARTITION | TABLE | WRITE -| DISTRIBUTED | LOCALLY | UNORDERED | REPLACE | WITH | IDENTIFIER_KW | FIELDS | SET +: ADD | ALTER | AS | ASC | BRANCH | BY | CALL | CREATE | DAYS | DESC | DROP | FIELD | FIRST | HOURS | LAST | NULLS | OF | ORDERED | PARTITION | TABLE | WRITE +| DISTRIBUTED | LOCALLY | MINUTES | MONTHS | UNORDERED | REPLACE | RETAIN | VERSION | WITH | IDENTIFIER_KW | FIELDS | SET | SNAPSHOT | SNAPSHOTS | TRUE | FALSE | MAP ; +snapshotId +: number +; + +numSnapshots +: number +; + +snapshotRetain +: number +; + +snapshotRefRetain +: number +; + +snapshotRefRetainTimeUnit +: timeUnit +; + +snapshotRetainTimeUnit +: timeUnit +; + +timeUnit +: MONTHS +| DAYS +| HOURS +| MINUTES Review Comment: missing SECONDS? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch
jackye1995 commented on code in PR #6617: URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082010234 ## spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4: ## @@ -73,6 +73,13 @@ statement | ALTER TABLE multipartIdentifier WRITE writeSpec #setWriteDistributionAndOrdering | ALTER TABLE multipartIdentifier SET IDENTIFIER_KW FIELDS fieldList #setIdentifierFields | ALTER TABLE multipartIdentifier DROP IDENTIFIER_KW FIELDS fieldList #dropIdentifierFields +| ALTER TABLE multipartIdentifier CREATE BRANCH identifier (AS OF VERSION snapshotId)? (snapshotRetentionClause)? (RETAIN snapshotRefRetain snapshotRefRetainTimeUnit)? #createBranch +; + +snapshotRetentionClause +: WITH SNAPSHOT RETENTION numSnapshots SNAPSHOTS +| WITH SNAPSHOT RETENTION snapshotRetain snapshotRetainTimeUnit +| WITH SNAPSHOT RETENTION numSnapshots SNAPSHOTS snapshotRetain snapshotRetainTimeUnit Review Comment: I see, I will leave it here to see if anyone has better suggestions. I am not an Antlr expert -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #5339: Adding the same file twice for the same table
github-actions[bot] commented on issue #5339: URL: https://github.com/apache/iceberg/issues/5339#issuecomment-1397768506 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec
jackye1995 commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397757916 I am referring to the **view spec**, using the example here: https://iceberg.apache.org/view-spec/#appendix-a-an-example So in design 1 where we say we want to have a pointer from view to storage table, the pointer is a new type of representation `materialized`. We already support multiple representations as of today in the view spec, but today it only can have multiple SQL representations. By adding this new type, it means a view can be backed by multiple storage tables, by adding more `materialized` representations. The use case that you said about a table having multiple storage layout could also be modeled as a view with multiple `materialized` representations and no any SQL representation. It's just a thought, not fully flushed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6598: Core: View representation core implementation
amogh-jahagirdar commented on PR #6598: URL: https://github.com/apache/iceberg/pull/6598#issuecomment-1397753619 We probably want to establish a standard in the community at this point on Immutable/Nullable or not. Right now we're in this partial state, where it's used in some cases but defining a standard can help. I do think Immutables are really nice at keeping boiler plate code to a minimum but I don't have a strong opinion other than just setting a standard practice in Iceberg :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] wmoustafa commented on issue #6420: Iceberg Materialized View Spec
wmoustafa commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397753231 To clarify, I was saying that multiple representations are outside the scope of MVs, and could be part of standard table spec. Not sure if the proposal above is along the same lines (a bit confused since I see `materialized` references). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] huaxingao commented on a diff in pull request #6622: push down min/max/count to iceberg
huaxingao commented on code in PR #6622: URL: https://github.com/apache/iceberg/pull/6622#discussion_r1081982772 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -158,6 +182,130 @@ public Filter[] pushedFilters() { return pushedFilters; } + @Override + public boolean pushAggregation(Aggregation aggregation) { +if (!pushDownAggregate(aggregation)) { + return false; +} + +AggregateEvaluator aggregateEvaluator; +try { + List aggregates = + Arrays.stream(aggregation.aggregateExpressions()) + .map(agg -> SparkAggregates.convert(agg)) + .collect(Collectors.toList()); + aggregateEvaluator = AggregateEvaluator.create(schema, aggregates); +} catch (Exception e) { + LOG.info("Can't push down aggregates: " + e.getMessage()); + return false; +} + +if (!metricsModeSupportsAggregatePushDown(aggregateEvaluator.aggregates())) { + LOG.info("The MetricsMode doesn't support aggregate push down."); + return false; +} + +List manifests = table.currentSnapshot().allManifests(table.io()); + +for (ManifestFile manifest : manifests) { + try (ManifestReader reader = ManifestFiles.read(manifest, table.io())) { +for (DataFile dataFile : reader) { + aggregateEvaluator.update(dataFile.copy()); +} + } catch (IOException e) { +LOG.info("Can't push down aggregates: " + e.getMessage()); +return false; + } +} + +Object[] res = aggregateEvaluator.result(); +applyDataTypeConversionIfNecessary(res); + +List valuesInSparkInternalRow = java.util.Arrays.asList(res); +this.pushedAggregateRows = new InternalRow[1]; +pushedAggregateRows[0] = + InternalRow.fromSeq(JavaConverters.asScalaBuffer(valuesInSparkInternalRow).toSeq()); +pushedAggregateSchema = +SparkSchemaUtil.convert(new Schema(aggregateEvaluator.resultType().fields())); +return true; + } + + private boolean pushDownAggregate(Aggregation aggregation) { +if (!(table instanceof BaseTable)) { + return false; +} + +if (!readConf.aggregatePushDown()) { + return false; +} + +Snapshot currentSnapshot = table.currentSnapshot(); Review Comment: Thanks for your comment! I think I need to check `readConf.snapshotId()` first to get the time travel snapshot. If there is no time travel snapshot, then get `table.currentSnapshot()`. I will fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags
jackye1995 commented on code in PR #6627: URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081982454 ## docs/spark-queries.md: ## @@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak SQL -Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses +Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses. +The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name. + +{{< hint info >}} +Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot +with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot 2. +If the time travel clause is `VERSION AS OF '1'` time travel will be done to the snapshot with id 1. Review Comment: nit: version travel clause -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags
jackye1995 commented on code in PR #6627: URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081982200 ## docs/spark-queries.md: ## @@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak SQL -Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses +Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses. +The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name. + +{{< hint info >}} +Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot +with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot 2. Review Comment: nit: and it references snapshot with ID 2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec
jackye1995 commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397751094 I don't know if it would work or too crazy, just to throw the idea out that I just came up with: We could potentially make MV a representation in view spec, in parallel to the SQL representation. So for a materialized view, it could have 2 representations: ``` [ { "type" : "sql", "sql" : "SELECT \"count\"(*) my_cnt\nFROM\n base_tab\n", => Note the updated text from the ‘replace’ view statement "dialect" : "spark", "schema-id" : 2, "default-catalog" : "iceberg", "default-namespace" : [ "anorwood" ] }, { "type" : "materialized", "namespace": "some_namespace" "table" : "some_storage_table" } ] ``` By doing so, you could support a case where 1 view does not really have a SQL representation, but just a few different table layouts: ``` [ { "type" : "materialized", "namespace": "some_namespace" "table" : "some_storage_table_partitioned_by_col1" }, { "type" : "materialized", "namespace": "some_namespace" "table" : "some_storage_table_partitioned_by_col2" } ] ``` which satisfies the use case described by Walaa. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags
amogh-jahagirdar commented on code in PR #6627: URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081978533 ## docs/spark-queries.md: ## @@ -95,21 +95,39 @@ The above list is in order of priority. For example: a matching catalog will tak SQL -Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses +Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses. +The `VERSION AS OF` clause can be used for time traveling to a specific snapshot via the following options: Review Comment: Sure this seems reasonable and should be clear for readers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec
jackye1995 commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397742071 > Generically speaking, a table (MV or not), identified by a UUID, could have multiple storage layouts, and execution engines can choose the best storage layout. That's correct, but technically it could be argued that we can model everything as a view, backed by multiple storage tables, and we don't need to add more complexity into the table spec to support multiple storage layouts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] wmoustafa commented on issue #6420: Iceberg Materialized View Spec
wmoustafa commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397732221 > I am thinking about the case where based on the predicate operating on the view, we can choose intelligently what storage table to use. I think this is potentially a generic use case for all tables, not only MV storage tables. Generically speaking, a table (MV or not), identified by a UUID, could have multiple storage layouts, and execution engines can choose the best storage layout. MV still points to a single UUID. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec
jackye1995 commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397711644 > Not sure if there is a strong use case for multiple tables for the same view version. I am thinking about the case where based on the predicate operating on the view, we can choose intelligently what storage table to use. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6622: push down min/max/count to iceberg
amogh-jahagirdar commented on code in PR #6622: URL: https://github.com/apache/iceberg/pull/6622#discussion_r1081947384 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -158,6 +182,130 @@ public Filter[] pushedFilters() { return pushedFilters; } + @Override + public boolean pushAggregation(Aggregation aggregation) { +if (!pushDownAggregate(aggregation)) { + return false; +} + +AggregateEvaluator aggregateEvaluator; +try { + List aggregates = + Arrays.stream(aggregation.aggregateExpressions()) + .map(agg -> SparkAggregates.convert(agg)) + .collect(Collectors.toList()); + aggregateEvaluator = AggregateEvaluator.create(schema, aggregates); +} catch (Exception e) { + LOG.info("Can't push down aggregates: " + e.getMessage()); + return false; +} + +if (!metricsModeSupportsAggregatePushDown(aggregateEvaluator.aggregates())) { + LOG.info("The MetricsMode doesn't support aggregate push down."); + return false; +} + +List manifests = table.currentSnapshot().allManifests(table.io()); + +for (ManifestFile manifest : manifests) { + try (ManifestReader reader = ManifestFiles.read(manifest, table.io())) { +for (DataFile dataFile : reader) { + aggregateEvaluator.update(dataFile.copy()); +} + } catch (IOException e) { +LOG.info("Can't push down aggregates: " + e.getMessage()); +return false; + } +} + +Object[] res = aggregateEvaluator.result(); +applyDataTypeConversionIfNecessary(res); + +List valuesInSparkInternalRow = java.util.Arrays.asList(res); +this.pushedAggregateRows = new InternalRow[1]; +pushedAggregateRows[0] = + InternalRow.fromSeq(JavaConverters.asScalaBuffer(valuesInSparkInternalRow).toSeq()); +pushedAggregateSchema = +SparkSchemaUtil.convert(new Schema(aggregateEvaluator.resultType().fields())); +return true; + } + + private boolean pushDownAggregate(Aggregation aggregation) { +if (!(table instanceof BaseTable)) { + return false; +} + +if (!readConf.aggregatePushDown()) { + return false; +} + +Snapshot currentSnapshot = table.currentSnapshot(); Review Comment: Sorry maybe a silly question but how does this work with time travel? Here we're reading the summary from the latest table snapshot but if an aggregation is done on a historical snapshot then we may skip the check below unintentionally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6624: Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.
amogh-jahagirdar commented on code in PR #6624: URL: https://github.com/apache/iceberg/pull/6624#discussion_r1081932750 ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/SnapshotTableProcedure.java: ## @@ -93,10 +94,20 @@ public InternalRow[] call(InternalRow args) { }); } +int parallelism; +if (!args.isNullAt(4)) { + parallelism = args.getInt(4); +} else { + parallelism = 1; +} Review Comment: Nit: I think it's a bit cleaner to use ternary here `int parallelism = args.getInt(4) != null ? args.getInt(4) : 1` ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java: ## @@ -99,7 +100,13 @@ public InternalRow[] call(InternalRow args) { if (dropBackup) { result = migrateTableSparkAction.dropBackup().execute(); } else { - result = migrateTableSparkAction.execute(); + int parallelism; + if (!args.isNullAt(3)) { +parallelism = args.getInt(3); + } else { +parallelism = 1; + } Review Comment: Same nit as below i think ternary assignment works well here ## spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java: ## @@ -39,7 +39,8 @@ class MigrateTableProcedure extends BaseProcedure { new ProcedureParameter[] { ProcedureParameter.required("table", DataTypes.StringType), ProcedureParameter.optional("properties", STRING_MAP), -ProcedureParameter.optional("drop_backup", DataTypes.BooleanType) +ProcedureParameter.optional("drop_backup", DataTypes.BooleanType), +ProcedureParameter.optional("max_concurrent_read_datafiles", DataTypes.IntegerType) Review Comment: Any reason we can't just call this parameter "parallelism" ? I think it will only be data files involved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6624: Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.
amogh-jahagirdar commented on PR #6624: URL: https://github.com/apache/iceberg/pull/6624#issuecomment-1397701478 Left a review, thanks for the contribution @kingeasternsun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] flyrain commented on a diff in pull request #6582: Add a Spark procedure to collect NDV
flyrain commented on code in PR #6582: URL: https://github.com/apache/iceberg/pull/6582#discussion_r1081900515 ## core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java: ## @@ -26,4 +26,6 @@ private StandardBlobTypes() {} * href="https://datasketches.apache.org/;>Apache DataSketches library */ public static final String APACHE_DATASKETCHES_THETA_V1 = "apache-datasketches-theta-v1"; + + public static final String NDV_BLOB = "ndv-blob"; Review Comment: The only downside is that we are going to have two items for one purpose. 1. A new blob type for NDV used by Spark 2. APACHE_DATASKETCHES_THETA_V1 for NDV used by Trino, and updated by Trino. I believe that's something we are trying to avoid. I'm not sure why do we need a sketch blob for NDV at the table level. Spark only need a NDV number for its planning. Does Trino need something extra? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] wmoustafa commented on issue #6420: Iceberg Materialized View Spec
wmoustafa commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397595262 Agreed. Reverse pointer to the view will be hard to maintain, so I am inclined to not having it. I would say each view version could optionally map to a new storage table (say the view evolved in a backward incompatible way). Not sure if there is a strong use case for multiple tables for the same view version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec
jackye1995 commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397589244 So just want to push the progress forward, I think we have some kind of loose consensus that: 1. view + storage table is likely the general approach to go 2. view stores pointer to the storage table, or potentially multiple storage tables 3. storage table should store the information about each refresh as table snapshot properties, some reverse pointer to the view is also good to have Is that the right understanding? @rdblue @JanKaul @wmoustafa -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #6626: Core: Update error msg
jackye1995 commented on PR #6626: URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1397579647 > not sure, usually it's been called out on my own PRs to adjust the error msg to that particular format (hence the reason I mentioned it on the other PR), which is being used across other places in Iceberg In that case, would it be better for us to change all the related null checks to checkArgument, and update the checkstyle rule for it to avoid use of checkNotNull? I think we could take this opportunity to set some standards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags
jackye1995 commented on code in PR #6627: URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081835641 ## docs/spark-queries.md: ## @@ -95,21 +95,39 @@ The above list is in order of priority. For example: a matching catalog will tak SQL -Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses +Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses. +The `VERSION AS OF` clause can be used for time traveling to a specific snapshot via the following options: Review Comment: the wording seems a bit overly complex. What about we talk about `VERSION AS OF` with the data type? If it is long, then it is snapshot ID. If it is string, then it follows what is described below. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags
jackye1995 commented on code in PR #6627: URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081834635 ## docs/spark-queries.md: ## @@ -95,21 +95,39 @@ The above list is in order of priority. For example: a matching catalog will tak SQL -Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses +Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses. +The `VERSION AS OF` clause can be used for time traveling to a specific snapshot via the following options: + +1. Snapshot ID +2. Head of a branch +3. Tagged snapshot + +Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot Review Comment: nit: use `{{< hint info >}}` for notes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6570: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882
szehon-ho commented on code in PR #6570: URL: https://github.com/apache/iceberg/pull/6570#discussion_r1081787456 ## hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java: ## @@ -0,0 +1,533 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.hive; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import java.net.InetAddress; +import java.net.UnknownHostException; +import java.util.Optional; +import java.util.UUID; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.locks.ReentrantLock; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.IMetaStoreClient; +import org.apache.hadoop.hive.metastore.api.LockComponent; +import org.apache.hadoop.hive.metastore.api.LockLevel; +import org.apache.hadoop.hive.metastore.api.LockRequest; +import org.apache.hadoop.hive.metastore.api.LockResponse; +import org.apache.hadoop.hive.metastore.api.LockState; +import org.apache.hadoop.hive.metastore.api.LockType; +import org.apache.hadoop.hive.metastore.api.ShowLocksRequest; +import org.apache.hadoop.hive.metastore.api.ShowLocksResponse; +import org.apache.hadoop.hive.metastore.api.ShowLocksResponseElement; +import org.apache.iceberg.ClientPool; +import org.apache.iceberg.exceptions.CommitFailedException; +import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.apache.iceberg.util.Tasks; +import org.apache.thrift.TException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class MetastoreLock implements HiveLock { + private static final Logger LOG = LoggerFactory.getLogger(MetastoreLock.class); + private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms"; + private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms"; + private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms"; + private static final String HIVE_LOCK_CREATION_TIMEOUT_MS = + "iceberg.hive.lock-creation-timeout-ms"; + private static final String HIVE_LOCK_CREATION_MIN_WAIT_MS = + "iceberg.hive.lock-creation-min-wait-ms"; + private static final String HIVE_LOCK_CREATION_MAX_WAIT_MS = + "iceberg.hive.lock-creation-max-wait-ms"; + private static final String HIVE_LOCK_HEARTBEAT_INTERVAL_MS = + "iceberg.hive.lock-heartbeat-interval-ms"; + private static final String HIVE_TABLE_LEVEL_LOCK_EVICT_MS = + "iceberg.hive.table-level-lock-evict-ms"; + + private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes + private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds + private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds + private static final long HIVE_LOCK_CREATION_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes + private static final long HIVE_LOCK_CREATION_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds + private static final long HIVE_LOCK_CREATION_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds + private static final long HIVE_LOCK_HEARTBEAT_INTERVAL_MS_DEFAULT = 4 * 60 * 1000; // 4 minutes + private static final long HIVE_TABLE_LEVEL_LOCK_EVICT_MS_DEFAULT = TimeUnit.MINUTES.toMillis(10); + + private final ClientPool metaClients; + private final String databaseName; + private final String tableName; + private final String fullName; + private final long lockAcquireTimeout; + private final long lockCheckMinWaitTime; + private final long lockCheckMaxWaitTime; + private final long lockCreationTimeout; + private final long lockCreationMinWaitTime; + private final long lockCreationMaxWaitTime; + private final long
[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6570: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882
szehon-ho commented on code in PR #6570: URL: https://github.com/apache/iceberg/pull/6570#discussion_r1081768050 ## hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java: ## @@ -53,9 +55,23 @@ private MetastoreUtil() {} */ public static void alterTable( IMetaStoreClient client, String databaseName, String tblName, Table table) { -EnvironmentContext envContext = -new EnvironmentContext( -ImmutableMap.of(StatsSetupConst.DO_NOT_UPDATE_STATS, StatsSetupConst.TRUE)); -ALTER_TABLE.invoke(client, databaseName, tblName, table, envContext); +alterTable(client, databaseName, tblName, table, ImmutableMap.of()); + } + + /** + * Calls alter_table method using the metastore client. If possible, an environmental context will Review Comment: Nit: Maybe the stats update part should be put as an inline comment instead of the method? Also is it 'if possible'? It seems its always used. ## hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java: ## @@ -0,0 +1,533 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.hive; + +import com.github.benmanes.caffeine.cache.Cache; +import com.github.benmanes.caffeine.cache.Caffeine; +import java.net.InetAddress; +import java.net.UnknownHostException; +import java.util.Optional; +import java.util.UUID; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.locks.ReentrantLock; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.IMetaStoreClient; +import org.apache.hadoop.hive.metastore.api.LockComponent; +import org.apache.hadoop.hive.metastore.api.LockLevel; +import org.apache.hadoop.hive.metastore.api.LockRequest; +import org.apache.hadoop.hive.metastore.api.LockResponse; +import org.apache.hadoop.hive.metastore.api.LockState; +import org.apache.hadoop.hive.metastore.api.LockType; +import org.apache.hadoop.hive.metastore.api.ShowLocksRequest; +import org.apache.hadoop.hive.metastore.api.ShowLocksResponse; +import org.apache.hadoop.hive.metastore.api.ShowLocksResponseElement; +import org.apache.iceberg.ClientPool; +import org.apache.iceberg.exceptions.CommitFailedException; +import org.apache.iceberg.relocated.com.google.common.base.MoreObjects; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder; +import org.apache.iceberg.util.Tasks; +import org.apache.thrift.TException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class MetastoreLock implements HiveLock { + private static final Logger LOG = LoggerFactory.getLogger(MetastoreLock.class); + private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = "iceberg.hive.lock-timeout-ms"; + private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = "iceberg.hive.lock-check-min-wait-ms"; + private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = "iceberg.hive.lock-check-max-wait-ms"; + private static final String HIVE_LOCK_CREATION_TIMEOUT_MS = + "iceberg.hive.lock-creation-timeout-ms"; + private static final String HIVE_LOCK_CREATION_MIN_WAIT_MS = + "iceberg.hive.lock-creation-min-wait-ms"; + private static final String HIVE_LOCK_CREATION_MAX_WAIT_MS = + "iceberg.hive.lock-creation-max-wait-ms"; + private static final String HIVE_LOCK_HEARTBEAT_INTERVAL_MS = + "iceberg.hive.lock-heartbeat-interval-ms"; + private static final String HIVE_TABLE_LEVEL_LOCK_EVICT_MS = + "iceberg.hive.table-level-lock-evict-ms"; + + private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 1000; // 3 minutes + private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 milliseconds + private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 5 seconds + private static final long
[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6621: [HiveCatalog] Support Altering and Dropping Table Ownership
szehon-ho commented on code in PR #6621: URL: https://github.com/apache/iceberg/pull/6621#discussion_r1081718764 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java: ## @@ -328,6 +329,140 @@ public void testCreateTableCustomSortOrder() throws Exception { } } + @Test + public void testAlterTableOwner() throws IOException, TException { +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_1", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_other_owner"), +"some_owner", +"some_other_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_2", +ImmutableMap.of(), +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "another_owner"), +UserGroupInformation.getCurrentUser().getUserName(), +"another_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_noop_1", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableMap.of(), +"some_owner", +"some_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_noop_2", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableMap.of("unrelated_prop", "val"), +"some_owner", +"some_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_noop_3", +ImmutableMap.of(), +ImmutableMap.of(), +UserGroupInformation.getCurrentUser().getUserName(), +UserGroupInformation.getCurrentUser().getUserName()); + } + + private void alterTableAndVerifyOwner( Review Comment: Nit: I think in most tests, we put the private at end of file? So the ```@Test``` are all in the top and easier to find. ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -494,6 +494,17 @@ private void setHmsTableParameters( // remove any props from HMS that are no longer present in Iceberg table props obsoleteProps.forEach(parameters::remove); +// altering owner +if (metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER) != null) { + tbl.setOwner(metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER)); +} + +// dropping owner: instead of leaving the owner blank/null, the owner will be +// default to whoever is making the current drop operation +if (obsoleteProps.contains(HiveCatalog.HMS_TABLE_OWNER)) { Review Comment: Also too don't know of the use-case, but makes sense to me to set it to null here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] haizhou-zhao commented on a diff in pull request #6621: [HiveCatalog] Support Altering and Dropping Table Ownership
haizhou-zhao commented on code in PR #6621: URL: https://github.com/apache/iceberg/pull/6621#discussion_r1081701820 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java: ## @@ -328,6 +329,140 @@ public void testCreateTableCustomSortOrder() throws Exception { } } + @Test + public void testAlterTableOwner() throws IOException, TException { +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_1", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_other_owner"), +"some_owner", +"some_other_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_2", +ImmutableMap.of(), +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "another_owner"), +UserGroupInformation.getCurrentUser().getUserName(), +"another_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_noop_1", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableMap.of(), +"some_owner", +"some_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_noop_2", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableMap.of("unrelated_prop", "val"), +"some_owner", +"some_owner"); +alterTableAndVerifyOwner( +DB_NAME, +"tbl_alter_owner_noop_3", +ImmutableMap.of(), +ImmutableMap.of(), +UserGroupInformation.getCurrentUser().getUserName(), +UserGroupInformation.getCurrentUser().getUserName()); + } + + private void alterTableAndVerifyOwner( + String db, + String tbl, + Map properties, + Map updates, + String expectedOwnerPostCreate, + String expectedOwnerPostAlter) + throws IOException, TException { +Schema schema = getTestSchema(); +PartitionSpec spec = PartitionSpec.builderFor(schema).bucket("data", 16).build(); +TableIdentifier tableIdent = TableIdentifier.of(db, tbl); +String location = temp.newFolder(tbl).toString(); +try { + Table table = catalog.createTable(tableIdent, schema, spec, location, properties); + org.apache.hadoop.hive.metastore.api.Table hmsTable = metastoreClient.getTable(db, tbl); + Assert.assertEquals(expectedOwnerPostCreate, hmsTable.getOwner()); + Assert.assertFalse(hmsTable.getParameters().containsKey(HiveCatalog.HMS_TABLE_OWNER)); + UpdateProperties updateOps = table.updateProperties(); + updates.forEach(updateOps::set); + updateOps.commit(); + hmsTable = metastoreClient.getTable(db, tbl); + Assert.assertEquals(expectedOwnerPostAlter, hmsTable.getOwner()); + Assert.assertFalse(hmsTable.getParameters().containsKey(HiveCatalog.HMS_TABLE_OWNER)); +} finally { + catalog.dropTable(tableIdent); +} + } + + @Test + public void testRemoveTableOwner() throws IOException, TException { +removeTableOwnerAndVerify( +DB_NAME, +"tbl_remove_owner_1", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableSet.of(HiveCatalog.HMS_TABLE_OWNER), +"some_owner", +UserGroupInformation.getCurrentUser().getUserName()); +removeTableOwnerAndVerify( +DB_NAME, +"tbl_remove_owner_noop_1", +ImmutableMap.of(), +ImmutableSet.of(HiveCatalog.HMS_TABLE_OWNER), +UserGroupInformation.getCurrentUser().getUserName(), +UserGroupInformation.getCurrentUser().getUserName()); +removeTableOwnerAndVerify( +DB_NAME, +"tbl_remove_owner_noop_2", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableSet.of(), +"some_owner", +"some_owner"); +removeTableOwnerAndVerify( +DB_NAME, +"tbl_remove_owner_noop_3", +ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"), +ImmutableSet.of("unrelated_prop"), +"some_owner", +"some_owner"); +removeTableOwnerAndVerify( +DB_NAME, +"tbl_remove_owner_noop_4", +ImmutableMap.of(), +ImmutableSet.of(), +UserGroupInformation.getCurrentUser().getUserName(), +UserGroupInformation.getCurrentUser().getUserName()); + } + + private void removeTableOwnerAndVerify( Review Comment: Ack. thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] haizhou-zhao commented on a diff in pull request #6621: [HiveCatalog] Support Altering and Dropping Table Ownership
haizhou-zhao commented on code in PR #6621: URL: https://github.com/apache/iceberg/pull/6621#discussion_r1081700457 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java: ## @@ -494,6 +494,17 @@ private void setHmsTableParameters( // remove any props from HMS that are no longer present in Iceberg table props obsoleteProps.forEach(parameters::remove); +// altering owner +if (metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER) != null) { + tbl.setOwner(metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER)); +} + +// dropping owner: instead of leaving the owner blank/null, the owner will be +// default to whoever is making the current drop operation +if (obsoleteProps.contains(HiveCatalog.HMS_TABLE_OWNER)) { Review Comment: Fair, I think that's a good point that perhaps simply removing owner and set null is closer to what users want. Trying to do some thing smart here might get user confused instead of helping. As for whether the "drop ownership" use case is needed overall, to be honest I personally do not have and cannot think of a use. However, from the code perspective, there's nothing stopping an iceberg user using iceberg APIs to remove `HiveCatalog.HMS_TABLE_OWNER` property. If that does happen, then I felt the actual intention of the user is to get rid of owner rather than do nothing. Maybe she wants to put the table into a state where only the admin and no other people could access it for a while before she knows for sure who the rightful owner should be. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081678527 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -36,17 +38,21 @@ default Type type() { String dialect(); /** The default catalog when the view is created. */ + @Nullable String defaultCatalog(); /** The default namespace when the view is created. */ + @Nullable Namespace defaultNamespace(); - /** The query output schema at version create time, without aliases. */ - Schema schema(); + /** The query output schema id at version create time */ + int schemaId(); Review Comment: Updated to use Integer schemaId(). Since schemaId may not be defined at creation time as discussed in https://github.com/apache/iceberg/pull/6611/files schemaId() can return null -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081676028 ## api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java: ## @@ -18,14 +18,17 @@ */ package org.apache.iceberg.view; +import edu.umd.cs.findbugs.annotations.Nullable; Review Comment: It seems like we are using Nullable in a few places already, for example https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/metrics/CommitMetricsResult.java#L54 . I'm good to revert this but that would imply not using Immutables as well here (unless there's another acceptable way to indicate to Immutable values that a field can be null). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081672308 ## api/src/main/java/org/apache/iceberg/view/ViewRepresentation.java: ## @@ -18,21 +18,16 @@ */ package org.apache.iceberg.view; -import java.util.Locale; +import org.immutables.value.Value; +@Value.Immutable public interface ViewRepresentation { - enum Type { -SQL; + class Type { +private Type() {} -public static Type fromString(String typeName) { - return valueOf(typeName.toUpperCase(Locale.ENGLISH)); -} - -public String typeName() { - return name().toLowerCase(Locale.ENGLISH); -} +public static final String SQL = "sql"; Review Comment: Yeah it seems like an established pattern elsewhere just to use a constant string and it simplifies the parsing logic a bit but I'm happy to revert back to enum if there's other advantages. Let me know your thoughts @rdblue @jzhuge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081672308 ## api/src/main/java/org/apache/iceberg/view/ViewRepresentation.java: ## @@ -18,21 +18,16 @@ */ package org.apache.iceberg.view; -import java.util.Locale; +import org.immutables.value.Value; +@Value.Immutable public interface ViewRepresentation { - enum Type { -SQL; + class Type { +private Type() {} -public static Type fromString(String typeName) { - return valueOf(typeName.toUpperCase(Locale.ENGLISH)); -} - -public String typeName() { - return name().toLowerCase(Locale.ENGLISH); -} +public static final String SQL = "sql"; Review Comment: Yeah it seems like an established pattern elsewhere just to use a constant string and it simplifies the parsing logic a bit but I'm happy to revert back to enum if there's other advantages. Let me know your thoughts @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation
amogh-jahagirdar commented on code in PR #6598: URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081669003 ## core/src/test/java/org/apache/iceberg/view/TestViewRepresentationParser.java: ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.view; + +import com.fasterxml.jackson.databind.JsonNode; +import java.util.List; +import org.apache.iceberg.Schema; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList; +import org.assertj.core.api.Assertions; +import org.junit.Assert; +import org.junit.Test; + +public class TestViewRepresentationParser { Review Comment: Sure updated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on pull request #6626: Core: Update error msg
nastra commented on PR #6626: URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1397400315 > If we are checking non null, I think the current error message still makes more sense? not sure, usually it's been called out on my own PRs to adjust the error msg to that particular format (hence the reason I mentioned it on the other PR), which is being used across other places in Iceberg -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #6586: AWS: make warehouse path optional for read only catalog use cases
jackye1995 commented on PR #6586: URL: https://github.com/apache/iceberg/pull/6586#issuecomment-1397395046 @aajisaka can you also take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #6626: Core: Update error msg
jackye1995 commented on PR #6626: URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1397386343 If we are checking non null, I think the current error message still makes more sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] nastra commented on a diff in pull request #6074: API,Core: SnapshotManager to be created through Transaction
nastra commented on code in PR #6074: URL: https://github.com/apache/iceberg/pull/6074#discussion_r1081611444 ## api/src/main/java/org/apache/iceberg/Transaction.java: ## @@ -155,6 +155,13 @@ default UpdateStatistics updateStatistics() { */ ExpireSnapshots expireSnapshots(); + /** + * Create a new {@link ManageSnapshots manage snapshot API} to manage snapshots in this table. + * + * @return a new {@link ManageSnapshots} + */ + ManageSnapshots manageSnapshots(); Review Comment: RevAPI is checking API and ABI compatibility, so it's not just tricking RevAPI, it's also to make sure that users of this particular `Transaction` API don't run into API/ABI breakages when they upgrade the Iceberg version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org