[GitHub] [iceberg] jackye1995 commented on issue #6632: Bug with Branch Transactions

2023-01-20 Thread GitBox


jackye1995 commented on issue #6632:
URL: https://github.com/apache/iceberg/issues/6632#issuecomment-1398734413

   Sure, assigned!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6631: Flink: backport PR #6584 to 1.14 and 1.15 for Avro GenericRecord in FLIP-27 source

2023-01-20 Thread GitBox


stevenzwu commented on code in PR #6631:
URL: https://github.com/apache/iceberg/pull/6631#discussion_r1082878466


##
flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestRowDataToAvroGenericRecordConverter.java:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.source;

Review Comment:
   this is the package name fix in 1.16 module. it should be under 
`flink/source` (not flink/`) folder



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on issue #6632: Bug with Branch Transactions

2023-01-20 Thread GitBox


amogh-jahagirdar commented on issue #6632:
URL: https://github.com/apache/iceberg/issues/6632#issuecomment-1398725654

   I'm working on a fix for this @jackye1995 could you assign this to me? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar opened a new issue, #6632: Bug with Branch Transactions

2023-01-20 Thread GitBox


amogh-jahagirdar opened a new issue, #6632:
URL: https://github.com/apache/iceberg/issues/6632

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 
   
   Creating this issue for awareness, was discussing with @rdblue seems like 
there's a bug in 
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L415
   
   when committing the transaction, currentSnapshotId is set to current's main 
snapshot id but this is incorrect if a transaction is being committed to a 
branch 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] stevenzwu commented on pull request #6631: Flink: backport PR #6584 to 1.14 and 1.15 for Avro GenericRecord in FLIP-27 source

2023-01-20 Thread GitBox


stevenzwu commented on PR #6631:
URL: https://github.com/apache/iceberg/pull/6631#issuecomment-1398722055

   I checked the following diff and found nothing related to the classes 
touched by PR #6584 
   
   ```
git diff --no-index  flink/v1.14/flink/src/ flink/v1.16/flink/src
git diff --no-index  flink/v1.15/flink/src/ flink/v1.16/flink/src
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] stevenzwu opened a new pull request, #6631: Flink: backport PR #6584 to 1.14 and 1.15 for Avro GenericRecord in FLIP-27 source

2023-01-20 Thread GitBox


stevenzwu opened a new pull request, #6631:
URL: https://github.com/apache/iceberg/pull/6631

   I also piggybacked the fix of package name (a mishap from PR #6584). some 
classes should be in the `flink/source/reader` packages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6584: Flink: support reading as Avro GenericRecord for FLIP-27 IcebergSource

2023-01-20 Thread GitBox


stevenzwu commented on code in PR #6584:
URL: https://github.com/apache/iceberg/pull/6584#discussion_r1082782121


##
flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/reader/AvroGenericRecordReaderFunction.java:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.flink.source.reader;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.ReadableConfig;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.encryption.EncryptionManager;
+import org.apache.iceberg.flink.source.AvroGenericRecordFileScanTaskReader;
+import org.apache.iceberg.flink.source.DataIterator;
+import org.apache.iceberg.flink.source.RowDataFileScanTaskReader;
+import org.apache.iceberg.flink.source.RowDataToAvroGenericRecordConverter;
+import org.apache.iceberg.flink.source.split.IcebergSourceSplit;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+/** Read Iceberg rows as {@link GenericRecord}. */
+public class AvroGenericRecordReaderFunction extends 
DataIteratorReaderFunction {
+  private final String tableName;
+  private final Schema readSchema;
+  private final FileIO io;
+  private final EncryptionManager encryption;
+  private final RowDataFileScanTaskReader rowDataReader;
+
+  private transient RowDataToAvroGenericRecordConverter converter;
+
+  /**
+   * Create a reader function without projection and name mapping. Column name 
is case-insensitive.
+   */
+  public static AvroGenericRecordReaderFunction fromTable(Table table) {
+return new AvroGenericRecordReaderFunction(
+table.name(),
+new Configuration(),
+table.schema(),
+null,
+null,
+false,

Review Comment:
   @hililiwei I am merging this PR. please continue to comment on this thread 
on reader function. I will follow up with a separate PR on the `withOutputType` 
idea if we agree that is the right direction. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] stevenzwu merged pull request #6584: Flink: support reading as Avro GenericRecord for FLIP-27 IcebergSource

2023-01-20 Thread GitBox


stevenzwu merged PR #6584:
URL: https://github.com/apache/iceberg/pull/6584


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6625: Improve nullability check in Iceberg codebase

2023-01-20 Thread GitBox


jackye1995 commented on issue #6625:
URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398630489

   > should we maybe raise this discussion topic on the mailing list in order 
to increase visibility for people?
   
   Yes agree, let's do that so we can reach a consensus and proceed to 
implementation. If there is no response we can raise it during the next sync 
meeting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on issue #6625: Improve nullability check in Iceberg codebase

2023-01-20 Thread GitBox


nastra commented on issue #6625:
URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398623955

   I also like annotations like `@Nullable` to indicate that certain things in 
the API can be nullable as this makes it easier to consume that particular API 
and reason about it. 
   Maybe we don't need something like `@NonNull` (given that we'd have to agree 
from which library we'd want to use that annotation) if we could imply that 
whatever doesn't have `@Nullable` is automatically non-null.
   An alternative would be the usage of `Optional`, but `@Nullable` would be my 
preferred option.
   However, I'd like to hear opinions & thoughts from other people as well on 
that topic.
   
   @jackye1995 should we maybe raise this discussion topic on the mailing list 
in order to increase visibility for people?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6625: Improve nullability check in Iceberg codebase

2023-01-20 Thread GitBox


jackye1995 commented on issue #6625:
URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398577729

   @nastra any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec

2023-01-20 Thread GitBox


jackye1995 commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1398572156

   I would +1 on storing in snapshot summary, because:
   1. snapshot corresponds very well to MV refresh, there is a 1:1 relationship 
between them.
   2. table properties is not versioned as well as snapshot, you cannot access 
previous able properties of a storage table easily
   
   I also agree with the suggestion of Ryan about the information to store, 
although I think storing the source tables referenced might be a bit hard, it 
requires statement analysis of the view.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko merged pull request #6628: Nessie: Bump to 0.47.0

2023-01-20 Thread GitBox


Fokko merged PR #6628:
URL: https://github.com/apache/iceberg/pull/6628


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] gaborkaszab commented on issue #6257: Partitions metadata table shows old partitions

2023-01-20 Thread GitBox


gaborkaszab commented on issue #6257:
URL: https://github.com/apache/iceberg/issues/6257#issuecomment-1398396353

   > What would the algorithm be? If the partition has delete files, try to do 
a full MOR, and check if records are null? Personally, sounds a bit extreme, I 
would think a good first step is just add a column for delete_files (It may be 
easier after my new change in #6365). After all, we do have a partition 
existing, just its of invalid delete files. Interested to hear others thoughts 
as well.
   
   Well, giving this a second (and a third) thought I have to admit that 
applying delete file on the data files to get the partitions is too 
heavyweight. I'm wondering if we should document this behaviour somewhere as I 
remember on Slack there was someone confused about the 'record_count' column of 
the metadata table not adding up to the same value what count(*) gives.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] mriveraFacephi commented on issue #2040: Partial data ingestion to Iceberg in failing with Spark 3.0.x

2023-01-20 Thread GitBox


mriveraFacephi commented on issue #2040:
URL: https://github.com/apache/iceberg/issues/2040#issuecomment-1398240262

   Same problem here with Spark 3.1.1 and Iceberg 0.13.1. I'm trying to write 
dataframe by using the Spark v2 API command writeTo. 
   Every column in my schema is nullable. In my case, I'm using AWS Glue, and 
the table metadata looks like:
   `"Columns": [
 {
   "Name": "col",
   "Type": "string",
   "Parameters": {
 "iceberg.field.current": "true",
 "iceberg.field.id": "1",
 **"iceberg.field.optional": "true"**
   }
 },...`
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on pull request #6628: Nessie: Bump to 0.47.0

2023-01-20 Thread GitBox


ajantha-bhat commented on PR #6628:
URL: https://github.com/apache/iceberg/pull/6628#issuecomment-1398211777

   I think we can bump it to `0.47.1` now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] cgpoh opened a new issue, #6630: Purpose of MAX_CONTINUOUS_EMPTY_COMMITS in IcebergFilesCommitter

2023-01-20 Thread GitBox


cgpoh opened a new issue, #6630:
URL: https://github.com/apache/iceberg/issues/6630

   ### Query engine
   
   Flink
   
   ### Question
   
   I have a Flink job that uses side output to write to Iceberg table when 
there are errors in the main processing function. If there are no errors in the 
processing function, no data files will be added to be committed. I noticed 
that the Flink job is restarting and throwing the following exception:
   ```
   IcebergFilesCommitter -> Sink: iceberg-error-sink-FPL (1/1)#152 
(2b5a4587aa3a50c531671a32a2f1538c) switched from RUNNING to FAILED with failure 
cause: java.lang.IllegalStateException: Cannot determine partition spec: no 
data files have been added
at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkState(Preconditions.java:502)
at 
org.apache.iceberg.MergingSnapshotProducer.dataSpec(MergingSnapshotProducer.java:150)
at 
org.apache.iceberg.BaseReplacePartitions.apply(BaseReplacePartitions.java:133)
at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:223)
at 
org.apache.iceberg.BaseReplacePartitions.apply(BaseReplacePartitions.java:26)
at 
org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:369)
at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:402)
at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:212)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:367)
at 
org.apache.iceberg.BaseReplacePartitions.commit(BaseReplacePartitions.java:26)
at 
org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitOperation(IcebergFilesCommitter.java:372)
at 
org.apache.iceberg.flink.sink.IcebergFilesCommitter.replacePartitions(IcebergFilesCommitter.java:314)
at 
org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitPendingResult(IcebergFilesCommitter.java:270)
at 
org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitUpToCheckpoint(IcebergFilesCommitter.java:255)
at 
org.apache.iceberg.flink.sink.IcebergFilesCommitter.notifyCheckpointComplete(IcebergFilesCommitter.java:229)
at 
org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:104)
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointComplete(RegularOperatorChain.java:145)
at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:409)
at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:343)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:1384)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$14(StreamTask.java:1325)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$17(StreamTask.java:1364)
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.base/java.lang.Thread.run(Unknown Source)
   ```
   I saw that in the `commitPendingResult` function of 
IcebergFilesCommitter.java, there's a condition to check whether to skip empty 
commit but if the MAX_CONTINUOUS_EMPTY_COMMITS is met, it will proceed to 
commit even there are no data files to commit and thus, throwing the above 
exception.
   ```java
   long totalFiles = summary.dataFilesCount() + summary.deleteFilesCount();
   continuousEmptyCheckpoints = totalFiles == 0 ? continuousEmptyCheckpoints + 
1 : 0;
   if (totalFiles != 0 || continuousEmptyCheckpoints % 
maxContinuousEmptyCommits == 0) {
 if 

[GitHub] [iceberg] kingeasternsun commented on pull request #6624:  Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-20 Thread GitBox


kingeasternsun commented on PR #6624:
URL: https://github.com/apache/iceberg/pull/6624#issuecomment-1398154903

   > Left a review, thanks for the contribution @kingeasternsun ! Also looks 
like spotless checks are failing which you can fix by running `./gradlew 
:iceberg-api:spotlessJavaCheck`
   
   Thanks for your advice, I have fixed them. Will you merge this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] JanKaul commented on issue #6420: Iceberg Materialized View Spec

2023-01-20 Thread GitBox


JanKaul commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1398113386

   Yes, I agree with the proposed design 1. I'm not entirely sure what @rdblue 
prefers.
   
   I will update the Google doc accordingly.
   
   The next question for me is where and how to store the refresh information. 
There are currently two proposed solutions:
   
   1. Store refresh information in snapshot properties of the storage table
   2. Store refresh information in storage table properties
   
   I will create a section in the Google doc that summarizes these options.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] findepi commented on pull request #6474: Make it explicit that metrics reporter is required

2023-01-20 Thread GitBox


findepi commented on PR #6474:
URL: https://github.com/apache/iceberg/pull/6474#issuecomment-1398101458

   thanks for the merge!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] findepi commented on pull request #6474: Make it explicit that metrics reporter is required

2023-01-20 Thread GitBox


findepi commented on PR #6474:
URL: https://github.com/apache/iceberg/pull/6474#issuecomment-1398101212

   > Yes this is the consequence of different styles of the projects.
   
   That's a good point.
   I accept the inherent friction being result of that, but I do hope some of 
that friction is avoidable.
   
   >  I like Trino's approach of checking null at every constructor and also 
use Optional as much as possible. 
   
   I like it too. That's why I fully supported @kokosing's push for to leverage 
Optional whenever possible.
   There are still some rare places that would benefit from improvement, but 
we're generally in quite a good shape with null-unfriendliness. 
   
   > For Iceberg, it would be ideal if nullability is at least indicated in 
documentation.
   
   100% agreed, especially if you mean API-level documentation (javadoc, 
annotations), as that's what library users will see in their IDEs.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] findepi commented on issue #6625: Improve nullability check in Iceberg codebase

2023-01-20 Thread GitBox


findepi commented on issue #6625:
URL: https://github.com/apache/iceberg/issues/6625#issuecomment-1398096721

   > Also there is little indication in the codebase of which field could 
potentially be null. This causes a lot of confusions for external engine 
integrations like Trino.
   
   I am happy to contribute annotations for nullable or non-null elements, if 
that's what we agree to do.
   This could look e.g. like this https://github.com/apache/iceberg/pull/4978, 
or maybe someone has better suggestions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624:  Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-20 Thread GitBox


kingeasternsun commented on code in PR #6624:
URL: https://github.com/apache/iceberg/pull/6624#discussion_r1082057002


##
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java:
##
@@ -39,7 +39,8 @@ class MigrateTableProcedure extends BaseProcedure {
   new ProcedureParameter[] {
 ProcedureParameter.required("table", DataTypes.StringType),
 ProcedureParameter.optional("properties", STRING_MAP),
-ProcedureParameter.optional("drop_backup", DataTypes.BooleanType)
+ProcedureParameter.optional("drop_backup", DataTypes.BooleanType),
+ProcedureParameter.optional("max_concurrent_read_datafiles", 
DataTypes.IntegerType)

Review Comment:
   Thanks for your reviews, It because this  
https://github.com/apache/iceberg/pull/3973#discussion_r794150006 
@RussellSpitzer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6629: Build: Fix minor error-prone warnings

2023-01-19 Thread GitBox


ajantha-bhat commented on code in PR #6629:
URL: https://github.com/apache/iceberg/pull/6629#discussion_r1082180504


##
core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java:
##
@@ -329,14 +329,14 @@ static Long expiresAtMillis(String token) {
   return null;
 }
 
-String[] parts = token.split("\\.");
-if (parts.length != 3) {
+List parts = Splitter.on('.').splitToList(token);

Review Comment:
   As it is not used in multiple places (in the same file), I thought no need 
for an extra variable. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6629: Build: Fix minor error-prone warnings

2023-01-19 Thread GitBox


nastra commented on code in PR #6629:
URL: https://github.com/apache/iceberg/pull/6629#discussion_r1082177976


##
core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java:
##
@@ -329,14 +329,14 @@ static Long expiresAtMillis(String token) {
   return null;
 }
 
-String[] parts = token.split("\\.");
-if (parts.length != 3) {
+List parts = Splitter.on('.').splitToList(token);

Review Comment:
   nit: should we extract this to a `private static final Splitter 
DOT_SPLITTER`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-19 Thread GitBox


nastra commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082174852


##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##
@@ -132,6 +133,29 @@ public void testCreateTableBadName() {
 TableIdentifier.of(namespace, "table-1"), schema, 
partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+glueCatalogWithoutWarehouse.initialize(
+catalogName,
+null,
+new AwsProperties(),
+glue,
+LockManagers.defaultLockManager(),
+new S3FileIO(clientFactory::s3),
+ImmutableMap.of());
+String namespace = createNamespace();
+String tableName = getRandomName();
+TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+try {

Review Comment:
   thanks, that makes sense in this case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on pull request #6626: Core: Update error msg

2023-01-19 Thread GitBox


nastra commented on PR #6626:
URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1398007171

   > > not sure, usually it's been called out on my own PRs to adjust the error 
msg to that particular format (hence the reason I mentioned it on the other 
PR), which is being used across other places in Iceberg
   > 
   > In that case, would it be better for us to change all the related null 
checks to checkArgument, and update the checkstyle rule for it to avoid use of 
checkNotNull? I think we could take this opportunity to set some standards.
   
   I think it's definitely worth discussing this so that people know what the 
accepted approach is and enforce it via checkstyle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] hililiwei commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


hililiwei commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082173399


##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala:
##
@@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: 
ParserInterface) extends IcebergS
   private def typedVisit[T](ctx: ParseTree): T = {
 ctx.accept(this).asInstanceOf[T]
   }
+
+  private def timeRetain(unit: String, duration: Long): Long = 
unit.toUpperCase(Locale.ENGLISH) match {
+case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration)

Review Comment:
   Yes, it does. Updated.
   
   



##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala:
##
@@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: 
ParserInterface) extends IcebergS
   private def typedVisit[T](ctx: ParseTree): T = {
 ctx.accept(this).asInstanceOf[T]
   }
+
+  private def timeRetain(unit: String, duration: Long): Long = 
unit.toUpperCase(Locale.ENGLISH) match {
+case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration)
+case "DAYS" | "HOURS" | "MINUTES" => 
TimeUnit.valueOf(unit.toUpperCase(Locale.ENGLISH)).toMillis(duration)

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-19 Thread GitBox


nastra commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1082170758


##
core/src/main/java/org/apache/iceberg/view/SQLViewRepresentationParser.java:
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.util.JsonUtil;
+
+class SQLViewRepresentationParser {
+  private static final String SQL = "sql";
+  private static final String DIALECT = "dialect";
+  private static final String SCHEMA_ID = "schema-id";
+  private static final String DEFAULT_CATALOG = "default-catalog";
+  private static final String DEFAULT_NAMESPACE = "default-namespace";
+  private static final String FIELD_ALIASES = "field-aliases";
+  private static final String FIELD_COMMENTS = "field-comments";
+
+  private SQLViewRepresentationParser() {}
+
+  static String toJson(SQLViewRepresentation sqlViewRepresentation) {
+return JsonUtil.generate(gen -> toJson(sqlViewRepresentation, gen), false);
+  }
+
+  static void toJson(SQLViewRepresentation view, JsonGenerator generator) 
throws IOException {
+Preconditions.checkArgument(view != null, "Invalid SQL view 
representation: null");
+generator.writeStartObject();
+generator.writeStringField(ViewRepresentationParser.TYPE, view.type());
+generator.writeStringField(SQL, view.sql());
+generator.writeStringField(DIALECT, view.dialect());
+
+if (view.schemaId() != null) {
+  generator.writeNumberField(SCHEMA_ID, view.schemaId());
+}
+
+if (view.defaultCatalog() != null) {
+  generator.writeStringField(DEFAULT_CATALOG, view.defaultCatalog());
+}
+
+if (view.defaultNamespace() != null) {
+  JsonUtil.writeStringArray(
+  DEFAULT_NAMESPACE, Arrays.asList(view.defaultNamespace().levels()), 
generator);
+}
+
+if (view.fieldAliases() != null && !view.fieldAliases().isEmpty()) {
+  JsonUtil.writeStringArray(
+  SQLViewRepresentationParser.FIELD_ALIASES, view.fieldAliases(), 
generator);
+}
+
+if (view.fieldComments() != null && !view.fieldComments().isEmpty()) {
+  JsonUtil.writeStringArray(
+  SQLViewRepresentationParser.FIELD_COMMENTS, view.fieldComments(), 
generator);
+}
+
+generator.writeEndObject();
+  }
+
+  static SQLViewRepresentation fromJson(String json) {
+Preconditions.checkArgument(
+json != null, "Cannot parse SQL view representation from null JSON");
+return JsonUtil.parse(json, SQLViewRepresentationParser::fromJson);
+  }
+
+  static SQLViewRepresentation fromJson(JsonNode node) {
+Preconditions.checkArgument(

Review Comment:
   nit: it would probably be good to also add 
Preconditions.checkArgument(node.isObject(), "Cannot parse SQL view 
representation from non-object: %s", node);`, similary to how you have it in 
the other parser



##
core/src/main/java/org/apache/iceberg/view/ViewRepresentationParser.java:
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082172027


##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##
@@ -132,6 +133,29 @@ public void testCreateTableBadName() {
 TableIdentifier.of(namespace, "table-1"), schema, 
partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+glueCatalogWithoutWarehouse.initialize(
+catalogName,
+null,
+new AwsProperties(),
+glue,
+LockManagers.defaultLockManager(),
+new S3FileIO(clientFactory::s3),
+ImmutableMap.of());
+String namespace = createNamespace();
+String tableName = getRandomName();
+TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+try {

Review Comment:
   yes, just try to make the failure reason more explicit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-19 Thread GitBox


nastra commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082168946


##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##
@@ -132,6 +133,29 @@ public void testCreateTableBadName() {
 TableIdentifier.of(namespace, "table-1"), schema, 
partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+glueCatalogWithoutWarehouse.initialize(
+catalogName,
+null,
+new AwsProperties(),
+glue,
+LockManagers.defaultLockManager(),
+new S3FileIO(clientFactory::s3),
+ImmutableMap.of());
+String namespace = createNamespace();
+String tableName = getRandomName();
+TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+try {

Review Comment:
   nit: is the try-catch really needed when we expect the test to succeed? Or 
is it more of a "give an additional hint when it fails"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


jackye1995 commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397987523

   @JanKaul if you agree with the summarized consensus we have mostly reached 
there, for the sake of moving the progress of the discussion forward, could you 
update the Google doc with the design, and describe the spec changes based on 
the discussions and suggestions here and combine with your original idea? So 
that we can start to review the spec contents  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082152137


##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##
@@ -132,6 +134,28 @@ public void testCreateTableBadName() {
 TableIdentifier.of(namespace, "table-1"), schema, 
partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+glueCatalogWithoutWarehouse.initialize(
+catalogName,
+null,
+new AwsProperties(),
+glue,
+LockManagers.defaultLockManager(),
+new S3FileIO(clientFactory::s3),
+ImmutableMap.of());
+String namespace = createNamespace();
+String tableName = getRandomName();
+TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+try {
+  glueCatalog.createTable(identifier, schema, partitionSpec, 
tableLocationProperties);
+  glueCatalog.loadTable(identifier);
+} catch (RuntimeException e) {
+  fail("Create and load table without warehouse location should succeed");

Review Comment:
   yes, I was not able to find a `fail(...)` method that allows taking 
exception, thus this. Throwing IO exception seems too much, I changed the code 
to rethrow the runtime exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082151825


##
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##
@@ -177,13 +178,9 @@ void initialize(
   GlueClient client,
   LockManager lock,
   FileIO io) {
-Preconditions.checkArgument(
-path != null && path.length() > 0,
-"Cannot initialize GlueCatalog because warehousePath must not be null 
or empty");
-
 this.catalogName = name;
 this.awsProperties = properties;
-this.warehousePath = LocationUtil.stripTrailingSlash(path);
+this.warehousePath = path != null ? LocationUtil.stripTrailingSlash(path) 
: null;

Review Comment:
   good point, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] aajisaka commented on a diff in pull request #6358: AWS: Print logs whether Glue optimistic locking is used or not

2023-01-19 Thread GitBox


aajisaka commented on code in PR #6358:
URL: https://github.com/apache/iceberg/pull/6358#discussion_r1082137636


##
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##
@@ -151,7 +151,12 @@ private LockManager initializeLockManager(Map properties) {
 if (properties.containsKey(CatalogProperties.LOCK_IMPL)) {
   return LockManagers.from(properties);
 } else if (SET_VERSION_ID.isNoop()) {
+  LOG.warn(
+  "Optimistic locking is not available in the environment. Using 
in-memory lock manager."
+  + " To ensure atomic transaction, you need to setup a DynamoDB 
lock manager.");

Review Comment:
   Fixed the warn message as you suggested. Thank you @amogh-jahagirdar 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] JanKaul commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


JanKaul commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397956873

   > I don't know if it would work or too crazy, just to throw the idea out 
that I just came up with:
   > 
   > We could potentially make MV a representation in view spec, in parallel to 
the SQL representation. So for a materialized view, it could have 2 
representations:
   > 
   > ```
   > [
   >   {
   >   "type" : "sql",
   >   "sql" : "SELECT \"count\"(*) my_cnt\nFROM\n  base_tab\n",
   >   "dialect" : "spark",
   >   "schema-id" : 2,
   >   "default-catalog" : "iceberg",
   >   "default-namespace" : [ "anorwood" ]
   > },
   > {
   >   "type" : "materialized",
   >   "namespace": "some_namespace"
   >   "table" : "some_storage_table"
   > }
   > ]
   > ```
   > 
   > By doing so, you could support a case where 1 view does not really have a 
SQL representation, but just a few different table layouts:
   > 
   > ```
   > [
   >   {
   >   "type" : "materialized",
   >   "namespace": "some_namespace"
   >   "table" : "some_storage_table_partitioned_by_col1"
   >},
   >   {
   >   "type" : "materialized",
   >   "namespace": "some_namespace"
   >   "table" : "some_storage_table_partitioned_by_col2"
   > }
   > ]
   > ```
   > 
   > which satisfies the use case described by Walaa.
   > 
   > 
   > 
   > 
   > 
   
   Great idea! I think this is a very clean solution to store the metadata. 
However I would rename the type to `sql_materialized` in case there are other 
representations in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] aajisaka commented on a diff in pull request #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-19 Thread GitBox


aajisaka commented on code in PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#discussion_r1082123777


##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogTable.java:
##
@@ -132,6 +134,28 @@ public void testCreateTableBadName() {
 TableIdentifier.of(namespace, "table-1"), schema, 
partitionSpec));
   }
 
+  @Test
+  public void testCreateAndLoadTableWithoutWarehouseLocation() {
+GlueCatalog glueCatalogWithoutWarehouse = new GlueCatalog();
+glueCatalogWithoutWarehouse.initialize(
+catalogName,
+null,
+new AwsProperties(),
+glue,
+LockManagers.defaultLockManager(),
+new S3FileIO(clientFactory::s3),
+ImmutableMap.of());
+String namespace = createNamespace();
+String tableName = getRandomName();
+TableIdentifier identifier = TableIdentifier.of(namespace, tableName);
+try {
+  glueCatalog.createTable(identifier, schema, partitionSpec, 
tableLocationProperties);
+  glueCatalog.loadTable(identifier);
+} catch (RuntimeException e) {
+  fail("Create and load table without warehouse location should succeed");

Review Comment:
   ```suggestion
 throw new IOException("Create and load table without warehouse 
location should succeed", e);
   ```
   We shouldn't hide the original exception. Without the exception message and 
the stacktrace, it's hard to find the root cause when an exception occur.



##
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##
@@ -177,13 +178,9 @@ void initialize(
   GlueClient client,
   LockManager lock,
   FileIO io) {
-Preconditions.checkArgument(
-path != null && path.length() > 0,
-"Cannot initialize GlueCatalog because warehousePath must not be null 
or empty");
-
 this.catalogName = name;
 this.awsProperties = properties;
-this.warehousePath = LocationUtil.stripTrailingSlash(path);
+this.warehousePath = path != null ? LocationUtil.stripTrailingSlash(path) 
: null;

Review Comment:
   If the `path` is empty, it fails in `LocationUtil.stripTrailiingSlash(path)`
   ```suggestion
   this.warehousePath = (path != null && path.length() > 0) ? 
LocationUtil.stripTrailingSlash(path) : null;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082118919


##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala:
##
@@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: 
ParserInterface) extends IcebergS
   private def typedVisit[T](ctx: ParseTree): T = {
 ctx.accept(this).asInstanceOf[T]
   }
+
+  private def timeRetain(unit: String, duration: Long): Long = 
unit.toUpperCase(Locale.ENGLISH) match {
+case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration)
+case "DAYS" | "HOURS" | "MINUTES" => 
TimeUnit.valueOf(unit.toUpperCase(Locale.ENGLISH)).toMillis(duration)

Review Comment:
   do we still need to use match statement? The parser will only allow these 
values, so we should be able to directly parse it to TimeUnit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082118486


##
spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4:
##
@@ -168,34 +175,77 @@ fieldList
 ;
 
 nonReserved
-: ADD | ALTER | AS | ASC | BY | CALL | DESC | DROP | FIELD | FIRST | LAST 
| NULLS | ORDERED | PARTITION | TABLE | WRITE
-| DISTRIBUTED | LOCALLY | UNORDERED | REPLACE | WITH | IDENTIFIER_KW | 
FIELDS | SET
+: ADD | ALTER | AS | ASC | BRANCH | BY | CALL | CREATE | DAYS | DESC | 
DROP | FIELD | FIRST | HOURS | LAST | NULLS | OF | ORDERED | PARTITION | TABLE 
| WRITE
+| DISTRIBUTED | LOCALLY | MINUTES | MONTHS | UNORDERED | REPLACE | RETAIN 
| VERSION | WITH | IDENTIFIER_KW | FIELDS | SET | SNAPSHOT | SNAPSHOTS
 | TRUE | FALSE
 | MAP
 ;
 
+snapshotId
+: number
+;
+
+numSnapshots
+: number
+;
+
+snapshotRetain
+: number
+;
+
+snapshotRefRetain
+: number
+;
+
+snapshotRefRetainTimeUnit
+: timeUnit
+;
+
+snapshotRetainTimeUnit
+: timeUnit
+;
+
+timeUnit
+: MONTHS
+| DAYS
+| HOURS
+| MINUTES

Review Comment:
   sure, I am fine with minute level. We can always add more if needed.



##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala:
##
@@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: 
ParserInterface) extends IcebergS
   private def typedVisit[T](ctx: ParseTree): T = {
 ctx.accept(this).asInstanceOf[T]
   }
+
+  private def timeRetain(unit: String, duration: Long): Long = 
unit.toUpperCase(Locale.ENGLISH) match {
+case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration)

Review Comment:
   hmm, good point, there is not exactly always 30 days in a month. That's 
probably why it is not supported by `TimeUnit`. So I would say in that case 
let's remove MONTHS from the time unit in parser and just support DAYS, HOURS 
and MINUTES.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082118109


##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala:
##
@@ -267,6 +286,12 @@ class IcebergSqlExtensionsAstBuilder(delegate: 
ParserInterface) extends IcebergS
   private def typedVisit[T](ctx: ParseTree): T = {
 ctx.accept(this).asInstanceOf[T]
   }
+
+  private def timeRetain(unit: String, duration: Long): Long = 
unit.toUpperCase(Locale.ENGLISH) match {
+case "MONTHS" => 30 * TimeUnit.DAYS.toMillis(duration)

Review Comment:
   hmm, good point, there is not exactly always 30 days in a month. That's 
probably why it is not supported by `TimeUnit`. So I would say in that case 
let's remove MONTHS from the time unit in parser and just support DAYS, HOURS, 
MINUTES and SECONDS



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082116187


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopFileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to create an Iceberg table 
snapshot in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class);
+
+  private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final ImmutableMap.Builder 
additionalPropertiesBuilder =
+  ImmutableMap.builder();
+  private DeltaLog deltaLog;
+  private Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private TableIdentifier newTableIdentifier;
+  private String newTableLocation;
+  private HadoopFileIO deltaLakeFileIO;
+
+  /**
+   * Snapshot a delta lake table to be an iceberg table. The action will read 
the delta lake table's
+   * log through the table's path, create a new iceberg table using the given 
icebergCatalog and
+   * newTableIdentifier, and commit all changes in one iceberg transaction.
+   *
+   * The new table will only be created if the snapshot is successful.
+   *
+   * @param deltaTableLocation the delta lake table's path
+   */
+  BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) {
+this.deltaTableLocation = deltaTableLocation;
+this.newTableLocation = deltaTableLocation;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperties(Map 
properties) {
+additionalPropertiesBuilder.putAll(properties);
+return this;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperty(String 

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-19 Thread GitBox


JonasJ-ap commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097202


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopFileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to create an Iceberg table 
snapshot in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class);
+
+  private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final ImmutableMap.Builder 
additionalPropertiesBuilder =
+  ImmutableMap.builder();
+  private DeltaLog deltaLog;
+  private Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private TableIdentifier newTableIdentifier;
+  private String newTableLocation;
+  private HadoopFileIO deltaLakeFileIO;
+
+  /**
+   * Snapshot a delta lake table to be an iceberg table. The action will read 
the delta lake table's
+   * log through the table's path, create a new iceberg table using the given 
icebergCatalog and
+   * newTableIdentifier, and commit all changes in one iceberg transaction.
+   *
+   * The new table will only be created if the snapshot is successful.
+   *
+   * @param deltaTableLocation the delta lake table's path
+   */
+  BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) {
+this.deltaTableLocation = deltaTableLocation;
+this.newTableLocation = deltaTableLocation;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperties(Map 
properties) {
+additionalPropertiesBuilder.putAll(properties);
+return this;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperty(String 

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-19 Thread GitBox


JonasJ-ap commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097202


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopFileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to create an Iceberg table 
snapshot in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class);
+
+  private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final ImmutableMap.Builder 
additionalPropertiesBuilder =
+  ImmutableMap.builder();
+  private DeltaLog deltaLog;
+  private Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private TableIdentifier newTableIdentifier;
+  private String newTableLocation;
+  private HadoopFileIO deltaLakeFileIO;
+
+  /**
+   * Snapshot a delta lake table to be an iceberg table. The action will read 
the delta lake table's
+   * log through the table's path, create a new iceberg table using the given 
icebergCatalog and
+   * newTableIdentifier, and commit all changes in one iceberg transaction.
+   *
+   * The new table will only be created if the snapshot is successful.
+   *
+   * @param deltaTableLocation the delta lake table's path
+   */
+  BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) {
+this.deltaTableLocation = deltaTableLocation;
+this.newTableLocation = deltaTableLocation;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperties(Map 
properties) {
+additionalPropertiesBuilder.putAll(properties);
+return this;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperty(String 

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-19 Thread GitBox


JonasJ-ap commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082104013


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopFileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to create an Iceberg table 
snapshot in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class);
+
+  private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final ImmutableMap.Builder 
additionalPropertiesBuilder =
+  ImmutableMap.builder();
+  private DeltaLog deltaLog;
+  private Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private TableIdentifier newTableIdentifier;
+  private String newTableLocation;
+  private HadoopFileIO deltaLakeFileIO;
+
+  /**
+   * Snapshot a delta lake table to be an iceberg table. The action will read 
the delta lake table's
+   * log through the table's path, create a new iceberg table using the given 
icebergCatalog and
+   * newTableIdentifier, and commit all changes in one iceberg transaction.
+   *
+   * The new table will only be created if the snapshot is successful.
+   *
+   * @param deltaTableLocation the delta lake table's path
+   */
+  BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) {
+this.deltaTableLocation = deltaTableLocation;
+this.newTableLocation = deltaTableLocation;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperties(Map 
properties) {
+additionalPropertiesBuilder.putAll(properties);
+return this;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperty(String 

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-19 Thread GitBox


JonasJ-ap commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082104013


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopFileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to create an Iceberg table 
snapshot in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class);
+
+  private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final ImmutableMap.Builder 
additionalPropertiesBuilder =
+  ImmutableMap.builder();
+  private DeltaLog deltaLog;
+  private Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private TableIdentifier newTableIdentifier;
+  private String newTableLocation;
+  private HadoopFileIO deltaLakeFileIO;
+
+  /**
+   * Snapshot a delta lake table to be an iceberg table. The action will read 
the delta lake table's
+   * log through the table's path, create a new iceberg table using the given 
icebergCatalog and
+   * newTableIdentifier, and commit all changes in one iceberg transaction.
+   *
+   * The new table will only be created if the snapshot is successful.
+   *
+   * @param deltaTableLocation the delta lake table's path
+   */
+  BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) {
+this.deltaTableLocation = deltaTableLocation;
+this.newTableLocation = deltaTableLocation;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperties(Map 
properties) {
+additionalPropertiesBuilder.putAll(properties);
+return this;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperty(String 

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-19 Thread GitBox


JonasJ-ap commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097985


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopFileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to create an Iceberg table 
snapshot in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class);
+
+  private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final ImmutableMap.Builder 
additionalPropertiesBuilder =
+  ImmutableMap.builder();
+  private DeltaLog deltaLog;
+  private Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private TableIdentifier newTableIdentifier;
+  private String newTableLocation;
+  private HadoopFileIO deltaLakeFileIO;
+
+  /**
+   * Snapshot a delta lake table to be an iceberg table. The action will read 
the delta lake table's
+   * log through the table's path, create a new iceberg table using the given 
icebergCatalog and
+   * newTableIdentifier, and commit all changes in one iceberg transaction.
+   *
+   * The new table will only be created if the snapshot is successful.
+   *
+   * @param deltaTableLocation the delta lake table's path
+   */
+  BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) {
+this.deltaTableLocation = deltaTableLocation;
+this.newTableLocation = deltaTableLocation;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperties(Map 
properties) {
+additionalPropertiesBuilder.putAll(properties);
+return this;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperty(String 

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6449: Delta: Support Snapshot Delta Lake Table to Iceberg Table

2023-01-19 Thread GitBox


JonasJ-ap commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1082097202


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.net.URI;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopFileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.MappingUtil;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to create an Iceberg table 
snapshot in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseSnapshotDeltaLakeTableAction.class);
+
+  private static final String SNAPSHOT_SOURCE_PROP = "snapshot_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final ImmutableMap.Builder 
additionalPropertiesBuilder =
+  ImmutableMap.builder();
+  private DeltaLog deltaLog;
+  private Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private TableIdentifier newTableIdentifier;
+  private String newTableLocation;
+  private HadoopFileIO deltaLakeFileIO;
+
+  /**
+   * Snapshot a delta lake table to be an iceberg table. The action will read 
the delta lake table's
+   * log through the table's path, create a new iceberg table using the given 
icebergCatalog and
+   * newTableIdentifier, and commit all changes in one iceberg transaction.
+   *
+   * The new table will only be created if the snapshot is successful.
+   *
+   * @param deltaTableLocation the delta lake table's path
+   */
+  BaseSnapshotDeltaLakeTableAction(String deltaTableLocation) {
+this.deltaTableLocation = deltaTableLocation;
+this.newTableLocation = deltaTableLocation;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperties(Map 
properties) {
+additionalPropertiesBuilder.putAll(properties);
+return this;
+  }
+
+  @Override
+  public SnapshotDeltaLakeTable tableProperty(String 

[GitHub] [iceberg] amogh-jahagirdar commented on issue #6619: Disaster Recovery Options for AWS Athena/Iceberg Integration

2023-01-19 Thread GitBox


amogh-jahagirdar commented on issue #6619:
URL: https://github.com/apache/iceberg/issues/6619#issuecomment-139751

   Thanks for creating this issue, @anthonysgro could you provide more details 
on how you're recreating the table and pointing the location and how AWS Backup 
fits in?
   
   As far as my understanding goes, for disaster recovery for Iceberg tables 
generally it's recommended to use S3 multi-region access points . You can check 
out https://github.com/apache/iceberg/issues/5779 this discussion for more 
details.
   
   Some docs: 
   
   https://aws.amazon.com/s3/features/multi-region-access-points/
   https://iceberg.apache.org/docs/latest/aws/#s3-access-points
   
   If you desire S3 access point integration in Athena specifically, feel free 
to ping me on Slack! 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6588: Spark 3.3: Add Default Parallelism Level for All Spark Driver Based Deletes

2023-01-19 Thread GitBox


amogh-jahagirdar commented on PR #6588:
URL: https://github.com/apache/iceberg/pull/6588#issuecomment-1397883260

   Thanks for clarifying @RussellSpitzer I think it makes a ton of sense to 
leave the specifics of bulk vs parallel to the FileIO abstraction. In this 
case, we leverage bulk delete wherever possible (S3) and then do parallel 
deletions for file systems which don't support bulk ops like HDFS to improve 
the throughput of deletions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on pull request #6629: Build: Fix minor error-prone warnings

2023-01-19 Thread GitBox


ajantha-bhat commented on PR #6629:
URL: https://github.com/apache/iceberg/pull/6629#issuecomment-1397859049

   cc: @nastra, @Fokko  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6629: Build: Fix minor error-prone warnings

2023-01-19 Thread GitBox


ajantha-bhat opened a new pull request, #6629:
URL: https://github.com/apache/iceberg/pull/6629

   I have observed that the build [`./gradlew clean build -x test`] has some 
warnings. 
   So it is an effort to keep the build green. 
   
   Before:
   https://user-images.githubusercontent.com/5889404/213606755-f8a491a3-99e4-4e6f-ac57-c6b5437e160c.png;>
   
   
   After:
   https://user-images.githubusercontent.com/5889404/213606771-87eed4bb-abf9-42d4-887a-f24e1815f3b2.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624:  Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-19 Thread GitBox


kingeasternsun commented on code in PR #6624:
URL: https://github.com/apache/iceberg/pull/6624#discussion_r1082061029


##
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/SnapshotTableProcedure.java:
##
@@ -93,10 +94,20 @@ public InternalRow[] call(InternalRow args) {
   });
 }
 
+int parallelism;
+if (!args.isNullAt(4)) {
+  parallelism = args.getInt(4);
+} else {
+  parallelism = 1;
+}

Review Comment:
   Thanks for your advice, I will fix this.



##
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java:
##
@@ -99,7 +100,13 @@ public InternalRow[] call(InternalRow args) {
 if (dropBackup) {
   result = migrateTableSparkAction.dropBackup().execute();
 } else {
-  result = migrateTableSparkAction.execute();
+  int parallelism;
+  if (!args.isNullAt(3)) {
+parallelism = args.getInt(3);
+  } else {
+parallelism = 1;
+  }

Review Comment:
   Thanks for your advice, I will fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] kingeasternsun commented on a diff in pull request #6624:  Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-19 Thread GitBox


kingeasternsun commented on code in PR #6624:
URL: https://github.com/apache/iceberg/pull/6624#discussion_r1082057002


##
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java:
##
@@ -39,7 +39,8 @@ class MigrateTableProcedure extends BaseProcedure {
   new ProcedureParameter[] {
 ProcedureParameter.required("table", DataTypes.StringType),
 ProcedureParameter.optional("properties", STRING_MAP),
-ProcedureParameter.optional("drop_backup", DataTypes.BooleanType)
+ProcedureParameter.optional("drop_backup", DataTypes.BooleanType),
+ProcedureParameter.optional("max_concurrent_read_datafiles", 
DataTypes.IntegerType)

Review Comment:
   Thanks for your reviews, It because this  
https://github.com/apache/iceberg/pull/3973#discussion_r794150006



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] hililiwei commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


hililiwei commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082056581


##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala:
##
@@ -267,6 +285,16 @@ class IcebergSqlExtensionsAstBuilder(delegate: 
ParserInterface) extends IcebergS
   private def typedVisit[T](ctx: ParseTree): T = {
 ctx.accept(this).asInstanceOf[T]
   }
+
+  private val timeUnit = (unit: String) => {
+unit.toUpperCase(Locale.ENGLISH) match {
+  case "MONTHS" => 30 * 24 * 60 * 60 * 1000L

Review Comment:
   Modified. PTAL. Thanks.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] cgpoh closed issue #6606: MinIO com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

2023-01-19 Thread GitBox


cgpoh closed issue #6606: MinIO com.amazonaws.SdkClientException: Unable to 
execute HTTP request: Timeout waiting for connection from pool
URL: https://github.com/apache/iceberg/issues/6606


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] cgpoh commented on issue #6606: MinIO com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool

2023-01-19 Thread GitBox


cgpoh commented on issue #6606:
URL: https://github.com/apache/iceberg/issues/6606#issuecomment-1397847153

   After looking into the code, realised that instead of having 
s3.connection.maximum in flink configuration, I should set the values in Hadoop 
configuration and pass in the configuration to HiveCatalog instead.
   
   ```kotlin
   val conf = Configuration()
   conf["fs.s3a.connection.maximum"] = 100
   val catalogLoader = CatalogLoader.hive(hiveCatalogName, conf, properties)
   ```
   
   with this, I can see the maximum connection is set to 100 in the log. I will 
close this issue for now as my job is running for 24hrs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] hililiwei commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


hililiwei commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082045421


##
spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4:
##
@@ -168,34 +175,77 @@ fieldList
 ;
 
 nonReserved
-: ADD | ALTER | AS | ASC | BY | CALL | DESC | DROP | FIELD | FIRST | LAST 
| NULLS | ORDERED | PARTITION | TABLE | WRITE
-| DISTRIBUTED | LOCALLY | UNORDERED | REPLACE | WITH | IDENTIFIER_KW | 
FIELDS | SET
+: ADD | ALTER | AS | ASC | BRANCH | BY | CALL | CREATE | DAYS | DESC | 
DROP | FIELD | FIRST | HOURS | LAST | NULLS | OF | ORDERED | PARTITION | TABLE 
| WRITE
+| DISTRIBUTED | LOCALLY | MINUTES | MONTHS | UNORDERED | REPLACE | RETAIN 
| VERSION | WITH | IDENTIFIER_KW | FIELDS | SET | SNAPSHOT | SNAPSHOTS
 | TRUE | FALSE
 | MAP
 ;
 
+snapshotId
+: number
+;
+
+numSnapshots
+: number
+;
+
+snapshotRetain
+: number
+;
+
+snapshotRefRetain
+: number
+;
+
+snapshotRefRetainTimeUnit
+: timeUnit
+;
+
+snapshotRetainTimeUnit
+: timeUnit
+;
+
+timeUnit
+: MONTHS
+| DAYS
+| HOURS
+| MINUTES

Review Comment:
   Should we support it? I prefer at least the minute-level. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2023-01-19 Thread GitBox


szehon-ho commented on code in PR #6365:
URL: https://github.com/apache/iceberg/pull/6365#discussion_r1082037773


##
core/src/main/java/org/apache/iceberg/PositionDeletesTable.java:
##
@@ -0,0 +1,396 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import static 
org.apache.iceberg.MetadataColumns.POSITION_DELETE_TABLE_FILE_PATH_ID;
+import static 
org.apache.iceberg.MetadataColumns.POSITION_DELETE_TABLE_PARTITION_FIELD_ID;
+import static org.apache.iceberg.MetadataColumns.POSITION_DELETE_TABLE_SPEC_ID;
+
+import com.github.benmanes.caffeine.cache.Caffeine;
+import com.github.benmanes.caffeine.cache.LoadingCache;
+import java.util.List;
+import java.util.Map;
+import java.util.function.BiFunction;
+import java.util.stream.Collectors;
+import org.apache.iceberg.expressions.Expression;
+import org.apache.iceberg.expressions.Expressions;
+import org.apache.iceberg.expressions.ResidualEvaluator;
+import org.apache.iceberg.io.CloseableIterable;
+import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.TypeUtil;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.util.ParallelIterable;
+import org.apache.iceberg.util.PartitionUtil;
+import org.apache.iceberg.util.TableScanUtil;
+
+public class PositionDeletesTable extends BaseMetadataTable {
+
+  private final Schema schema;
+
+  PositionDeletesTable(TableOperations ops, Table table) {
+super(ops, table, table.name() + ".position_deletes");
+this.schema = calculateSchema();
+  }
+
+  PositionDeletesTable(TableOperations ops, Table table, String name) {
+super(ops, table, name);
+this.schema = calculateSchema();
+  }
+
+  @Override
+  MetadataTableType metadataTableType() {
+return MetadataTableType.POSITION_DELETES;
+  }
+
+  @Override
+  public TableScan newScan() {
+throw new UnsupportedOperationException(
+"Cannot create TableScan from table of type POSITION_DELETES");
+  }
+
+  @Override
+  public BatchScan newBatchScan() {
+return new PositionDeletesBatchScan(operations(), table(), schema());
+  }
+
+  @Override
+  public Schema schema() {
+return schema;
+  }
+
+  private Schema calculateSchema() {
+Types.StructType partitionType = Partitioning.partitionType(table());
+Schema result =
+new Schema(
+MetadataColumns.DELETE_FILE_PATH,
+MetadataColumns.DELETE_FILE_POS,
+Types.NestedField.optional(
+MetadataColumns.DELETE_FILE_ROW_FIELD_ID,
+"row",
+table().schema().asStruct(),
+MetadataColumns.DELETE_FILE_ROW_DOC),
+Types.NestedField.required(
+POSITION_DELETE_TABLE_PARTITION_FIELD_ID,
+"partition",
+partitionType,
+"Partition that position delete row belongs to"),
+Types.NestedField.required(
+POSITION_DELETE_TABLE_SPEC_ID,
+"spec_id",
+Types.IntegerType.get(),
+"Spec ID of the file that the position delete row belongs to"),
+Types.NestedField.required(
+POSITION_DELETE_TABLE_FILE_PATH_ID,
+"delete_file_path",
+Types.StringType.get(),
+"Path of the delete file that the position delete row belongs 
to"));
+
+if (partitionType.fields().size() > 0) {
+  return result;
+} else {
+  // avoid returning an empty struct, which is not always supported.
+  // instead, drop the partition field
+  return TypeUtil.selectNot(result, 
Sets.newHashSet(POSITION_DELETE_TABLE_PARTITION_FIELD_ID));
+}
+  }
+
+  public static class PositionDeletesBatchScan
+  extends SnapshotScan> 
implements BatchScan {
+
+protected PositionDeletesBatchScan(TableOperations ops, Table table, 
Schema schema) {
+  super(ops, table, schema, new TableScanContext());
+}
+
+protected PositionDeletesBatchScan(
+

[GitHub] [iceberg] dmgcodevil commented on issue #6587: Wrong class, java.lang.Long, for object: 19367

2023-01-19 Thread GitBox


dmgcodevil commented on issue #6587:
URL: https://github.com/apache/iceberg/issues/6587#issuecomment-1397819184

   Delete orphan files action is also affected after the schema change:
   
   ```
   java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Void
at 
org.apache.iceberg.transforms.VoidTransform.toHumanString(VoidTransform.java:27)
at 
org.apache.iceberg.ManifestsTable.partitionSummariesToRows(ManifestsTable.java:131)
at 
org.apache.iceberg.ManifestsTable.manifestFileToRow(ManifestsTable.java:115)
at 
org.apache.iceberg.AllManifestsTable$ManifestListReadTask.lambda$rows$0(AllManifestsTable.java:194)
at 
org.apache.iceberg.io.CloseableIterable$4$1.next(CloseableIterable.java:113)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

2023-01-19 Thread GitBox


ajantha-bhat commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1082026758


##
docs/spark-queries.md:
##
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a 
matching catalog will tak
 
  SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch 
or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the 
snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a 
tag named '1' and it references snapshot with ID 2. 
+If the version travel clause is `VERSION AS OF '1'`, time travel will be done 
to the snapshot with ID 1. 
+If this is not desired, rename the tag or branch with a well-defined prefix 
such as 'snapshot-1'.
+{{< /hint >}}
 
 ```sql 
 -- time travel to October 26, 1986 at 01:21:00
 SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00';
 
 -- time travel to snapshot with id 10963874102873L
 SELECT * FROM prod.db.table VERSION AS OF 10963874102873;
+
+-- time travel to the head snapshot of audit-branch
+SELECT * FROM prod.db.table VERSION AS OF 'audit-branch';
+
+-- time travel to the snapshot referenced by the tag historical-snapshot
+SELECT * FROM prod.db.table VERSION AS OF 'historical-snapshot';
 ```
 
 In addition, `FOR SYSTEM_TIME AS OF` and `FOR SYSTEM_VERSION AS OF` clauses 
are also supported:
 
 ```sql
 SELECT * FROM prod.db.table FOR SYSTEM_TIME AS OF '1986-10-26 01:21:00';
 SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 10963874102873;
+SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'audit-branch';
+SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'historical-snapshot';

Review Comment:
   lines 145 and 146 are still ambiguous to me. 
   
   > * `branch` selects the head snapshot of the specified branch. Note that 
currently branch cannot be combined with as-of-timestamp.
   * `tag` selects the snapshot associated with the specified tag
   
   Does that mean tags support `as-of-timestamp`? No right? Maybe we need to 
update it for tags too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#issuecomment-1397789606

   Ping some people for thoughts around the syntax: @rdblue @RussellSpitzer 
@nastra 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082011726


##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateBranchExec.scala:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.source.SparkTable
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.CreateBranch
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.TableCatalog
+
+case class CreateBranchExec(
+ catalog: TableCatalog,
+ ident: Identifier,
+ createBranch: CreateBranch) extends 
LeafV2CommandExec {
+
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+catalog.loadTable(ident) match {
+  case iceberg: SparkTable =>
+
+val snapshotId = 
createBranch.snapshotId.getOrElse(iceberg.table.currentSnapshot().snapshotId())
+iceberg.table.manageSnapshots()
+  .createBranch(createBranch.branch, snapshotId)
+  .setMinSnapshotsToKeep(createBranch.branch, 
createBranch.numSnapshots.getOrElse(1L).toInt)
+  // 5 days
+  .setMaxSnapshotAgeMs(createBranch.branch, 
createBranch.snapshotRetain.getOrElse(5 * 24 * 60 * 60 * 1000L))
+  .setMaxRefAgeMs(createBranch.branch, 
createBranch.snapshotRefRetain.getOrElse(Long.MaxValue))
+  .commit()
+
+  case table =>
+throw new UnsupportedOperationException(s"Cannot add branch to 
non-Iceberg table: $table")

Review Comment:
   This seems like an existing pattern for all extensions, so I think it is 
probably fine to leave it like this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082011227


##
spark/v3.3/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala:
##
@@ -267,6 +285,16 @@ class IcebergSqlExtensionsAstBuilder(delegate: 
ParserInterface) extends IcebergS
   private def typedVisit[T](ctx: ParseTree): T = {
 ctx.accept(this).asInstanceOf[T]
   }
+
+  private val timeUnit = (unit: String) => {
+unit.toUpperCase(Locale.ENGLISH) match {
+  case "MONTHS" => 30 * 24 * 60 * 60 * 1000L

Review Comment:
   can use `TimeUnit.MONTHS.toMillis(...)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082010680


##
spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4:
##
@@ -168,34 +175,77 @@ fieldList
 ;
 
 nonReserved
-: ADD | ALTER | AS | ASC | BY | CALL | DESC | DROP | FIELD | FIRST | LAST 
| NULLS | ORDERED | PARTITION | TABLE | WRITE
-| DISTRIBUTED | LOCALLY | UNORDERED | REPLACE | WITH | IDENTIFIER_KW | 
FIELDS | SET
+: ADD | ALTER | AS | ASC | BRANCH | BY | CALL | CREATE | DAYS | DESC | 
DROP | FIELD | FIRST | HOURS | LAST | NULLS | OF | ORDERED | PARTITION | TABLE 
| WRITE
+| DISTRIBUTED | LOCALLY | MINUTES | MONTHS | UNORDERED | REPLACE | RETAIN 
| VERSION | WITH | IDENTIFIER_KW | FIELDS | SET | SNAPSHOT | SNAPSHOTS
 | TRUE | FALSE
 | MAP
 ;
 
+snapshotId
+: number
+;
+
+numSnapshots
+: number
+;
+
+snapshotRetain
+: number
+;
+
+snapshotRefRetain
+: number
+;
+
+snapshotRefRetainTimeUnit
+: timeUnit
+;
+
+snapshotRetainTimeUnit
+: timeUnit
+;
+
+timeUnit
+: MONTHS
+| DAYS
+| HOURS
+| MINUTES

Review Comment:
   missing SECONDS?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6617: Spark: Spark SQL Extensions for create branch

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6617:
URL: https://github.com/apache/iceberg/pull/6617#discussion_r1082010234


##
spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4:
##
@@ -73,6 +73,13 @@ statement
 | ALTER TABLE multipartIdentifier WRITE writeSpec  
 #setWriteDistributionAndOrdering
 | ALTER TABLE multipartIdentifier SET IDENTIFIER_KW FIELDS fieldList   
 #setIdentifierFields
 | ALTER TABLE multipartIdentifier DROP IDENTIFIER_KW FIELDS fieldList  
 #dropIdentifierFields
+| ALTER TABLE multipartIdentifier CREATE BRANCH identifier (AS OF VERSION 
snapshotId)?  (snapshotRetentionClause)? (RETAIN snapshotRefRetain 
snapshotRefRetainTimeUnit)?   #createBranch
+;
+
+snapshotRetentionClause
+: WITH SNAPSHOT RETENTION numSnapshots SNAPSHOTS
+| WITH SNAPSHOT RETENTION snapshotRetain snapshotRetainTimeUnit
+| WITH SNAPSHOT RETENTION numSnapshots SNAPSHOTS snapshotRetain 
snapshotRetainTimeUnit

Review Comment:
   I see, I will leave it here to see if anyone has better suggestions. I am 
not an Antlr expert  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] commented on issue #5339: Adding the same file twice for the same table

2023-01-19 Thread GitBox


github-actions[bot] commented on issue #5339:
URL: https://github.com/apache/iceberg/issues/5339#issuecomment-1397768506

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


jackye1995 commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397757916

   I am referring to the **view spec**, using the example here: 
https://iceberg.apache.org/view-spec/#appendix-a-an-example
   
   So in design 1 where we say we want to have a pointer from view to storage 
table, the pointer is a new type of representation `materialized`.
   
   We already support multiple representations as of today in the view spec, 
but today it only can have multiple SQL representations. 
   
   By adding this new type, it means a view can be backed by multiple storage 
tables, by adding more `materialized` representations. 
   
   The use case that you said about a table having multiple storage layout 
could also be modeled as a view with multiple `materialized` representations 
and no any SQL representation.
   
   It's just a thought, not fully flushed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6598: Core: View representation core implementation

2023-01-19 Thread GitBox


amogh-jahagirdar commented on PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#issuecomment-1397753619

   We probably want to establish a standard in the community at this point on 
Immutable/Nullable or not. Right now we're in this partial state, where it's 
used in some cases but defining a standard can help. I do think Immutables are 
really nice at keeping boiler plate code to a minimum but I don't have a strong 
opinion other than just setting a standard practice in Iceberg :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] wmoustafa commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


wmoustafa commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397753231

   To clarify, I was saying that multiple representations are outside the scope 
of MVs, and could be part of standard table spec. Not sure if the proposal 
above is along the same lines (a bit confused since I see `materialized` 
references).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] huaxingao commented on a diff in pull request #6622: push down min/max/count to iceberg

2023-01-19 Thread GitBox


huaxingao commented on code in PR #6622:
URL: https://github.com/apache/iceberg/pull/6622#discussion_r1081982772


##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##
@@ -158,6 +182,130 @@ public Filter[] pushedFilters() {
 return pushedFilters;
   }
 
+  @Override
+  public boolean pushAggregation(Aggregation aggregation) {
+if (!pushDownAggregate(aggregation)) {
+  return false;
+}
+
+AggregateEvaluator aggregateEvaluator;
+try {
+  List aggregates =
+  Arrays.stream(aggregation.aggregateExpressions())
+  .map(agg -> SparkAggregates.convert(agg))
+  .collect(Collectors.toList());
+  aggregateEvaluator = AggregateEvaluator.create(schema, aggregates);
+} catch (Exception e) {
+  LOG.info("Can't push down aggregates: " + e.getMessage());
+  return false;
+}
+
+if 
(!metricsModeSupportsAggregatePushDown(aggregateEvaluator.aggregates())) {
+  LOG.info("The MetricsMode doesn't support aggregate push down.");
+  return false;
+}
+
+List manifests = 
table.currentSnapshot().allManifests(table.io());
+
+for (ManifestFile manifest : manifests) {
+  try (ManifestReader reader = ManifestFiles.read(manifest, 
table.io())) {
+for (DataFile dataFile : reader) {
+  aggregateEvaluator.update(dataFile.copy());
+}
+  } catch (IOException e) {
+LOG.info("Can't push down aggregates: " + e.getMessage());
+return false;
+  }
+}
+
+Object[] res = aggregateEvaluator.result();
+applyDataTypeConversionIfNecessary(res);
+
+List valuesInSparkInternalRow = java.util.Arrays.asList(res);
+this.pushedAggregateRows = new InternalRow[1];
+pushedAggregateRows[0] =
+
InternalRow.fromSeq(JavaConverters.asScalaBuffer(valuesInSparkInternalRow).toSeq());
+pushedAggregateSchema =
+SparkSchemaUtil.convert(new 
Schema(aggregateEvaluator.resultType().fields()));
+return true;
+  }
+
+  private boolean pushDownAggregate(Aggregation aggregation) {
+if (!(table instanceof BaseTable)) {
+  return false;
+}
+
+if (!readConf.aggregatePushDown()) {
+  return false;
+}
+
+Snapshot currentSnapshot = table.currentSnapshot();

Review Comment:
   Thanks for your comment! I think I need to check `readConf.snapshotId()` 
first to get the time travel snapshot. If there is no time travel snapshot, 
then get `table.currentSnapshot()`. I will fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081982454


##
docs/spark-queries.md:
##
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a 
matching catalog will tak
 
  SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch 
or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the 
snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a 
tag named '1' and it references snapshot 2. 
+If the time travel clause is `VERSION AS OF '1'` time travel will be done to 
the snapshot with id 1. 

Review Comment:
   nit: version travel clause



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081982200


##
docs/spark-queries.md:
##
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a 
matching catalog will tak
 
  SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch 
or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the 
snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a 
tag named '1' and it references snapshot 2. 

Review Comment:
   nit: and it references snapshot with ID 2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


jackye1995 commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397751094

   I don't know if it would work or too crazy, just to throw the idea out that 
I just came up with:
   
   We could potentially make MV a representation in view spec, in parallel to 
the SQL representation. So for a materialized view, it could have 2 
representations:
   
   ```
   [
 {
 "type" : "sql",
 "sql" : "SELECT \"count\"(*) my_cnt\nFROM\n  base_tab\n", => Note the 
updated text from the ‘replace’ view statement
 "dialect" : "spark",
 "schema-id" : 2,
 "default-catalog" : "iceberg",
 "default-namespace" : [ "anorwood" ]
   },
   {
 "type" : "materialized",
 "namespace": "some_namespace"
 "table" : "some_storage_table"
   }
   ]
   ```
   
   By doing so, you could support a case where 1 view does not really have a 
SQL representation, but just a few different table layouts:
   
   ```
   [
 {
 "type" : "materialized",
 "namespace": "some_namespace"
 "table" : "some_storage_table_partitioned_by_col1"
  },
 {
 "type" : "materialized",
 "namespace": "some_namespace"
 "table" : "some_storage_table_partitioned_by_col2"
   }
   ]
   ```
   
   which satisfies the use case described by Walaa.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081978533


##
docs/spark-queries.md:
##
@@ -95,21 +95,39 @@ The above list is in order of priority. For example: a 
matching catalog will tak
 
  SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can be used for time traveling to a specific 
snapshot via the following options:

Review Comment:
   Sure this seems reasonable and should be clear for readers 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


jackye1995 commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397742071

   > Generically speaking, a table (MV or not), identified by a UUID, could 
have multiple storage layouts, and execution engines can choose the best 
storage layout.
   
   That's correct, but technically it could be argued that we can model 
everything as a view, backed by multiple storage tables, and we don't need to 
add more complexity into the table spec to support multiple storage layouts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] wmoustafa commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


wmoustafa commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397732221

   > I am thinking about the case where based on the predicate operating on the 
view, we can choose intelligently what storage table to use.
   
   I think this is potentially a generic use case for all tables, not only MV 
storage tables. Generically speaking, a table (MV or not), identified by a 
UUID, could have multiple storage layouts, and execution engines can choose the 
best storage layout. MV still points to a single UUID.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


jackye1995 commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397711644

   > Not sure if there is a strong use case for multiple tables for the same 
view version.
   
   I am thinking about the case where based on the predicate operating on the 
view, we can choose intelligently what storage table to use.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6622: push down min/max/count to iceberg

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6622:
URL: https://github.com/apache/iceberg/pull/6622#discussion_r1081947384


##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##
@@ -158,6 +182,130 @@ public Filter[] pushedFilters() {
 return pushedFilters;
   }
 
+  @Override
+  public boolean pushAggregation(Aggregation aggregation) {
+if (!pushDownAggregate(aggregation)) {
+  return false;
+}
+
+AggregateEvaluator aggregateEvaluator;
+try {
+  List aggregates =
+  Arrays.stream(aggregation.aggregateExpressions())
+  .map(agg -> SparkAggregates.convert(agg))
+  .collect(Collectors.toList());
+  aggregateEvaluator = AggregateEvaluator.create(schema, aggregates);
+} catch (Exception e) {
+  LOG.info("Can't push down aggregates: " + e.getMessage());
+  return false;
+}
+
+if 
(!metricsModeSupportsAggregatePushDown(aggregateEvaluator.aggregates())) {
+  LOG.info("The MetricsMode doesn't support aggregate push down.");
+  return false;
+}
+
+List manifests = 
table.currentSnapshot().allManifests(table.io());
+
+for (ManifestFile manifest : manifests) {
+  try (ManifestReader reader = ManifestFiles.read(manifest, 
table.io())) {
+for (DataFile dataFile : reader) {
+  aggregateEvaluator.update(dataFile.copy());
+}
+  } catch (IOException e) {
+LOG.info("Can't push down aggregates: " + e.getMessage());
+return false;
+  }
+}
+
+Object[] res = aggregateEvaluator.result();
+applyDataTypeConversionIfNecessary(res);
+
+List valuesInSparkInternalRow = java.util.Arrays.asList(res);
+this.pushedAggregateRows = new InternalRow[1];
+pushedAggregateRows[0] =
+
InternalRow.fromSeq(JavaConverters.asScalaBuffer(valuesInSparkInternalRow).toSeq());
+pushedAggregateSchema =
+SparkSchemaUtil.convert(new 
Schema(aggregateEvaluator.resultType().fields()));
+return true;
+  }
+
+  private boolean pushDownAggregate(Aggregation aggregation) {
+if (!(table instanceof BaseTable)) {
+  return false;
+}
+
+if (!readConf.aggregatePushDown()) {
+  return false;
+}
+
+Snapshot currentSnapshot = table.currentSnapshot();

Review Comment:
   Sorry maybe a silly question but how does this work with time travel? Here 
we're reading the summary from the latest table snapshot but if an aggregation 
is done on a historical snapshot then we may skip the check below 
unintentionally. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6624:  Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6624:
URL: https://github.com/apache/iceberg/pull/6624#discussion_r1081932750


##
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/SnapshotTableProcedure.java:
##
@@ -93,10 +94,20 @@ public InternalRow[] call(InternalRow args) {
   });
 }
 
+int parallelism;
+if (!args.isNullAt(4)) {
+  parallelism = args.getInt(4);
+} else {
+  parallelism = 1;
+}

Review Comment:
   Nit: I think it's a bit cleaner to use ternary here
   
   `int parallelism = args.getInt(4) != null ? args.getInt(4) : 1`



##
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java:
##
@@ -99,7 +100,13 @@ public InternalRow[] call(InternalRow args) {
 if (dropBackup) {
   result = migrateTableSparkAction.dropBackup().execute();
 } else {
-  result = migrateTableSparkAction.execute();
+  int parallelism;
+  if (!args.isNullAt(3)) {
+parallelism = args.getInt(3);
+  } else {
+parallelism = 1;
+  }

Review Comment:
   Same nit as below i think ternary assignment works well here



##
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java:
##
@@ -39,7 +39,8 @@ class MigrateTableProcedure extends BaseProcedure {
   new ProcedureParameter[] {
 ProcedureParameter.required("table", DataTypes.StringType),
 ProcedureParameter.optional("properties", STRING_MAP),
-ProcedureParameter.optional("drop_backup", DataTypes.BooleanType)
+ProcedureParameter.optional("drop_backup", DataTypes.BooleanType),
+ProcedureParameter.optional("max_concurrent_read_datafiles", 
DataTypes.IntegerType)

Review Comment:
   Any reason we can't just call this parameter "parallelism" ? I think it will 
only be data files involved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on pull request #6624:  Add "parallelism" parameter to "add_files" syscall and MigrateTable, SnapshotTable.

2023-01-19 Thread GitBox


amogh-jahagirdar commented on PR #6624:
URL: https://github.com/apache/iceberg/pull/6624#issuecomment-1397701478

   Left a review, thanks for the contribution @kingeasternsun ! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] flyrain commented on a diff in pull request #6582: Add a Spark procedure to collect NDV

2023-01-19 Thread GitBox


flyrain commented on code in PR #6582:
URL: https://github.com/apache/iceberg/pull/6582#discussion_r1081900515


##
core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java:
##
@@ -26,4 +26,6 @@ private StandardBlobTypes() {}
* href="https://datasketches.apache.org/;>Apache DataSketches library
*/
   public static final String APACHE_DATASKETCHES_THETA_V1 = 
"apache-datasketches-theta-v1";
+
+  public static final String NDV_BLOB = "ndv-blob";

Review Comment:
   The only downside is that we are going to have two items for one purpose.
   1. A new blob type for NDV used by Spark
   2. APACHE_DATASKETCHES_THETA_V1 for NDV used by Trino, and updated by Trino.
   
I believe that's something we are trying to avoid. I'm not sure why do we 
need a sketch blob for NDV at the table level. Spark only need a NDV number for 
its planning. Does Trino need something extra?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] wmoustafa commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


wmoustafa commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397595262

   Agreed. Reverse pointer to the view will be hard to maintain, so I am 
inclined to not having it.
   
   I would say each view version could optionally map to a new storage table 
(say the view evolved in a backward incompatible way). Not sure if there is a 
strong use case for multiple tables for the same view version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on issue #6420: Iceberg Materialized View Spec

2023-01-19 Thread GitBox


jackye1995 commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1397589244

   So just want to push the progress forward, I think we have some kind of 
loose consensus that:
   1. view + storage table is likely the general approach to go
   2. view stores pointer to the storage table, or potentially multiple storage 
tables
   3. storage table should store the information about each refresh as table 
snapshot properties, some reverse pointer to the view is also good to have
   
   Is that the right understanding? @rdblue @JanKaul @wmoustafa 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on pull request #6626: Core: Update error msg

2023-01-19 Thread GitBox


jackye1995 commented on PR #6626:
URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1397579647

   > not sure, usually it's been called out on my own PRs to adjust the error 
msg to that particular format (hence the reason I mentioned it on the other 
PR), which is being used across other places in Iceberg
   
   In that case, would it be better for us to change all the related null 
checks to checkArgument, and update the checkstyle rule for it to avoid use of 
checkNotNull? I think we could take this opportunity to set some standards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081835641


##
docs/spark-queries.md:
##
@@ -95,21 +95,39 @@ The above list is in order of priority. For example: a 
matching catalog will tak
 
  SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can be used for time traveling to a specific 
snapshot via the following options:

Review Comment:
   the wording seems a bit overly complex. What about we talk about `VERSION AS 
OF` with the data type? If it is long, then it is snapshot ID. If it is string, 
then it follows what is described below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

2023-01-19 Thread GitBox


jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081834635


##
docs/spark-queries.md:
##
@@ -95,21 +95,39 @@ The above list is in order of priority. For example: a 
matching catalog will tak
 
  SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS 
OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can be used for time traveling to a specific 
snapshot via the following options:
+
+1. Snapshot ID
+2. Head of a branch
+3. Tagged snapshot
+
+Note: If the name of a branch or tag is the same as a snapshot ID, then the 
snapshot which is selected for time travel is the snapshot

Review Comment:
   nit: use `{{< hint info >}}` for notes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6570: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882

2023-01-19 Thread GitBox


szehon-ho commented on code in PR #6570:
URL: https://github.com/apache/iceberg/pull/6570#discussion_r1081787456


##
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java:
##
@@ -0,0 +1,533 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.ScheduledFuture;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.LockComponent;
+import org.apache.hadoop.hive.metastore.api.LockLevel;
+import org.apache.hadoop.hive.metastore.api.LockRequest;
+import org.apache.hadoop.hive.metastore.api.LockResponse;
+import org.apache.hadoop.hive.metastore.api.LockState;
+import org.apache.hadoop.hive.metastore.api.LockType;
+import org.apache.hadoop.hive.metastore.api.ShowLocksRequest;
+import org.apache.hadoop.hive.metastore.api.ShowLocksResponse;
+import org.apache.hadoop.hive.metastore.api.ShowLocksResponseElement;
+import org.apache.iceberg.ClientPool;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import 
org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.apache.iceberg.util.Tasks;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class MetastoreLock implements HiveLock {
+  private static final Logger LOG = 
LoggerFactory.getLogger(MetastoreLock.class);
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = 
"iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = 
"iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = 
"iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CREATION_TIMEOUT_MS =
+  "iceberg.hive.lock-creation-timeout-ms";
+  private static final String HIVE_LOCK_CREATION_MIN_WAIT_MS =
+  "iceberg.hive.lock-creation-min-wait-ms";
+  private static final String HIVE_LOCK_CREATION_MAX_WAIT_MS =
+  "iceberg.hive.lock-creation-max-wait-ms";
+  private static final String HIVE_LOCK_HEARTBEAT_INTERVAL_MS =
+  "iceberg.hive.lock-heartbeat-interval-ms";
+  private static final String HIVE_TABLE_LEVEL_LOCK_EVICT_MS =
+  "iceberg.hive.table-level-lock-evict-ms";
+
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 
1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 
milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 
5 seconds
+  private static final long HIVE_LOCK_CREATION_TIMEOUT_MS_DEFAULT = 3 * 60 * 
1000; // 3 minutes
+  private static final long HIVE_LOCK_CREATION_MIN_WAIT_MS_DEFAULT = 50; // 50 
milliseconds
+  private static final long HIVE_LOCK_CREATION_MAX_WAIT_MS_DEFAULT = 5 * 1000; 
// 5 seconds
+  private static final long HIVE_LOCK_HEARTBEAT_INTERVAL_MS_DEFAULT = 4 * 60 * 
1000; // 4 minutes
+  private static final long HIVE_TABLE_LEVEL_LOCK_EVICT_MS_DEFAULT = 
TimeUnit.MINUTES.toMillis(10);
+
+  private final ClientPool metaClients;
+  private final String databaseName;
+  private final String tableName;
+  private final String fullName;
+  private final long lockAcquireTimeout;
+  private final long lockCheckMinWaitTime;
+  private final long lockCheckMaxWaitTime;
+  private final long lockCreationTimeout;
+  private final long lockCreationMinWaitTime;
+  private final long lockCreationMaxWaitTime;
+  private final long 

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6570: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882

2023-01-19 Thread GitBox


szehon-ho commented on code in PR #6570:
URL: https://github.com/apache/iceberg/pull/6570#discussion_r1081768050


##
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java:
##
@@ -53,9 +55,23 @@ private MetastoreUtil() {}
*/
   public static void alterTable(
   IMetaStoreClient client, String databaseName, String tblName, Table 
table) {
-EnvironmentContext envContext =
-new EnvironmentContext(
-ImmutableMap.of(StatsSetupConst.DO_NOT_UPDATE_STATS, 
StatsSetupConst.TRUE));
-ALTER_TABLE.invoke(client, databaseName, tblName, table, envContext);
+alterTable(client, databaseName, tblName, table, ImmutableMap.of());
+  }
+
+  /**
+   * Calls alter_table method using the metastore client. If possible, an 
environmental context will

Review Comment:
   Nit: Maybe the stats update part should be put as an inline comment instead 
of the method?  Also is it 'if possible'?   It seems its always used.



##
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreLock.java:
##
@@ -0,0 +1,533 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.hive;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import java.net.InetAddress;
+import java.net.UnknownHostException;
+import java.util.Optional;
+import java.util.UUID;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.ScheduledFuture;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.LockComponent;
+import org.apache.hadoop.hive.metastore.api.LockLevel;
+import org.apache.hadoop.hive.metastore.api.LockRequest;
+import org.apache.hadoop.hive.metastore.api.LockResponse;
+import org.apache.hadoop.hive.metastore.api.LockState;
+import org.apache.hadoop.hive.metastore.api.LockType;
+import org.apache.hadoop.hive.metastore.api.ShowLocksRequest;
+import org.apache.hadoop.hive.metastore.api.ShowLocksResponse;
+import org.apache.hadoop.hive.metastore.api.ShowLocksResponseElement;
+import org.apache.iceberg.ClientPool;
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import 
org.apache.iceberg.relocated.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.apache.iceberg.util.Tasks;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class MetastoreLock implements HiveLock {
+  private static final Logger LOG = 
LoggerFactory.getLogger(MetastoreLock.class);
+  private static final String HIVE_ACQUIRE_LOCK_TIMEOUT_MS = 
"iceberg.hive.lock-timeout-ms";
+  private static final String HIVE_LOCK_CHECK_MIN_WAIT_MS = 
"iceberg.hive.lock-check-min-wait-ms";
+  private static final String HIVE_LOCK_CHECK_MAX_WAIT_MS = 
"iceberg.hive.lock-check-max-wait-ms";
+  private static final String HIVE_LOCK_CREATION_TIMEOUT_MS =
+  "iceberg.hive.lock-creation-timeout-ms";
+  private static final String HIVE_LOCK_CREATION_MIN_WAIT_MS =
+  "iceberg.hive.lock-creation-min-wait-ms";
+  private static final String HIVE_LOCK_CREATION_MAX_WAIT_MS =
+  "iceberg.hive.lock-creation-max-wait-ms";
+  private static final String HIVE_LOCK_HEARTBEAT_INTERVAL_MS =
+  "iceberg.hive.lock-heartbeat-interval-ms";
+  private static final String HIVE_TABLE_LEVEL_LOCK_EVICT_MS =
+  "iceberg.hive.table-level-lock-evict-ms";
+
+  private static final long HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT = 3 * 60 * 
1000; // 3 minutes
+  private static final long HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT = 50; // 50 
milliseconds
+  private static final long HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT = 5 * 1000; // 
5 seconds
+  private static final long 

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6621: [HiveCatalog] Support Altering and Dropping Table Ownership

2023-01-19 Thread GitBox


szehon-ho commented on code in PR #6621:
URL: https://github.com/apache/iceberg/pull/6621#discussion_r1081718764


##
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##
@@ -328,6 +329,140 @@ public void testCreateTableCustomSortOrder() throws 
Exception {
 }
   }
 
+  @Test
+  public void testAlterTableOwner() throws IOException, TException {
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_1",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_other_owner"),
+"some_owner",
+"some_other_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_2",
+ImmutableMap.of(),
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "another_owner"),
+UserGroupInformation.getCurrentUser().getUserName(),
+"another_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_noop_1",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableMap.of(),
+"some_owner",
+"some_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_noop_2",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableMap.of("unrelated_prop", "val"),
+"some_owner",
+"some_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_noop_3",
+ImmutableMap.of(),
+ImmutableMap.of(),
+UserGroupInformation.getCurrentUser().getUserName(),
+UserGroupInformation.getCurrentUser().getUserName());
+  }
+
+  private void alterTableAndVerifyOwner(

Review Comment:
   Nit: I think in most tests, we put the private at end of file?  So the 
```@Test``` are all in the top and easier to find.



##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##
@@ -494,6 +494,17 @@ private void setHmsTableParameters(
 // remove any props from HMS that are no longer present in Iceberg table 
props
 obsoleteProps.forEach(parameters::remove);
 
+// altering owner
+if (metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER) != null) {
+  tbl.setOwner(metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER));
+}
+
+// dropping owner: instead of leaving the owner blank/null, the owner will 
be
+// default to whoever is making the current drop operation
+if (obsoleteProps.contains(HiveCatalog.HMS_TABLE_OWNER)) {

Review Comment:
   Also too don't know of the use-case, but makes sense to me to set it to null 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] haizhou-zhao commented on a diff in pull request #6621: [HiveCatalog] Support Altering and Dropping Table Ownership

2023-01-19 Thread GitBox


haizhou-zhao commented on code in PR #6621:
URL: https://github.com/apache/iceberg/pull/6621#discussion_r1081701820


##
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java:
##
@@ -328,6 +329,140 @@ public void testCreateTableCustomSortOrder() throws 
Exception {
 }
   }
 
+  @Test
+  public void testAlterTableOwner() throws IOException, TException {
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_1",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_other_owner"),
+"some_owner",
+"some_other_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_2",
+ImmutableMap.of(),
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "another_owner"),
+UserGroupInformation.getCurrentUser().getUserName(),
+"another_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_noop_1",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableMap.of(),
+"some_owner",
+"some_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_noop_2",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableMap.of("unrelated_prop", "val"),
+"some_owner",
+"some_owner");
+alterTableAndVerifyOwner(
+DB_NAME,
+"tbl_alter_owner_noop_3",
+ImmutableMap.of(),
+ImmutableMap.of(),
+UserGroupInformation.getCurrentUser().getUserName(),
+UserGroupInformation.getCurrentUser().getUserName());
+  }
+
+  private void alterTableAndVerifyOwner(
+  String db,
+  String tbl,
+  Map properties,
+  Map updates,
+  String expectedOwnerPostCreate,
+  String expectedOwnerPostAlter)
+  throws IOException, TException {
+Schema schema = getTestSchema();
+PartitionSpec spec = PartitionSpec.builderFor(schema).bucket("data", 
16).build();
+TableIdentifier tableIdent = TableIdentifier.of(db, tbl);
+String location = temp.newFolder(tbl).toString();
+try {
+  Table table = catalog.createTable(tableIdent, schema, spec, location, 
properties);
+  org.apache.hadoop.hive.metastore.api.Table hmsTable = 
metastoreClient.getTable(db, tbl);
+  Assert.assertEquals(expectedOwnerPostCreate, hmsTable.getOwner());
+  
Assert.assertFalse(hmsTable.getParameters().containsKey(HiveCatalog.HMS_TABLE_OWNER));
+  UpdateProperties updateOps = table.updateProperties();
+  updates.forEach(updateOps::set);
+  updateOps.commit();
+  hmsTable = metastoreClient.getTable(db, tbl);
+  Assert.assertEquals(expectedOwnerPostAlter, hmsTable.getOwner());
+  
Assert.assertFalse(hmsTable.getParameters().containsKey(HiveCatalog.HMS_TABLE_OWNER));
+} finally {
+  catalog.dropTable(tableIdent);
+}
+  }
+
+  @Test
+  public void testRemoveTableOwner() throws IOException, TException {
+removeTableOwnerAndVerify(
+DB_NAME,
+"tbl_remove_owner_1",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableSet.of(HiveCatalog.HMS_TABLE_OWNER),
+"some_owner",
+UserGroupInformation.getCurrentUser().getUserName());
+removeTableOwnerAndVerify(
+DB_NAME,
+"tbl_remove_owner_noop_1",
+ImmutableMap.of(),
+ImmutableSet.of(HiveCatalog.HMS_TABLE_OWNER),
+UserGroupInformation.getCurrentUser().getUserName(),
+UserGroupInformation.getCurrentUser().getUserName());
+removeTableOwnerAndVerify(
+DB_NAME,
+"tbl_remove_owner_noop_2",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableSet.of(),
+"some_owner",
+"some_owner");
+removeTableOwnerAndVerify(
+DB_NAME,
+"tbl_remove_owner_noop_3",
+ImmutableMap.of(HiveCatalog.HMS_TABLE_OWNER, "some_owner"),
+ImmutableSet.of("unrelated_prop"),
+"some_owner",
+"some_owner");
+removeTableOwnerAndVerify(
+DB_NAME,
+"tbl_remove_owner_noop_4",
+ImmutableMap.of(),
+ImmutableSet.of(),
+UserGroupInformation.getCurrentUser().getUserName(),
+UserGroupInformation.getCurrentUser().getUserName());
+  }
+
+  private void removeTableOwnerAndVerify(

Review Comment:
   Ack. thx



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] haizhou-zhao commented on a diff in pull request #6621: [HiveCatalog] Support Altering and Dropping Table Ownership

2023-01-19 Thread GitBox


haizhou-zhao commented on code in PR #6621:
URL: https://github.com/apache/iceberg/pull/6621#discussion_r1081700457


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##
@@ -494,6 +494,17 @@ private void setHmsTableParameters(
 // remove any props from HMS that are no longer present in Iceberg table 
props
 obsoleteProps.forEach(parameters::remove);
 
+// altering owner
+if (metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER) != null) {
+  tbl.setOwner(metadata.properties().get(HiveCatalog.HMS_TABLE_OWNER));
+}
+
+// dropping owner: instead of leaving the owner blank/null, the owner will 
be
+// default to whoever is making the current drop operation
+if (obsoleteProps.contains(HiveCatalog.HMS_TABLE_OWNER)) {

Review Comment:
   Fair, I think that's a good point that perhaps simply removing owner and set 
null is closer to what users want. Trying to do some thing smart here might get 
user confused instead of helping.
   
   As for whether the "drop ownership" use case is needed overall, to be honest 
I personally do not have and cannot think of a use. However, from the code 
perspective, there's nothing stopping an iceberg user using iceberg APIs to 
remove `HiveCatalog.HMS_TABLE_OWNER` property. If that does happen, then I felt 
the actual intention of the user is to get rid of owner rather than do nothing. 
Maybe she wants to put the table into a state where only the admin and no other 
people could access it for a while before she knows for sure who the rightful 
owner should be.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081678527


##
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##
@@ -36,17 +38,21 @@ default Type type() {
   String dialect();
 
   /** The default catalog when the view is created. */
+  @Nullable
   String defaultCatalog();
 
   /** The default namespace when the view is created. */
+  @Nullable
   Namespace defaultNamespace();
 
-  /** The query output schema at version create time, without aliases. */
-  Schema schema();
+  /** The query output schema id at version create time */
+  int schemaId();

Review Comment:
   Updated to use Integer schemaId(). Since schemaId may not be defined at 
creation time as discussed in https://github.com/apache/iceberg/pull/6611/files 
schemaId() can return null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081676028


##
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##
@@ -18,14 +18,17 @@
  */
 package org.apache.iceberg.view;
 
+import edu.umd.cs.findbugs.annotations.Nullable;

Review Comment:
   It seems like we are using Nullable in a few places already, for example 
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/metrics/CommitMetricsResult.java#L54
 . I'm good to revert this but that would imply not using Immutables as well 
here (unless there's another acceptable way to indicate to Immutable values 
that a field can be null).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081672308


##
api/src/main/java/org/apache/iceberg/view/ViewRepresentation.java:
##
@@ -18,21 +18,16 @@
  */
 package org.apache.iceberg.view;
 
-import java.util.Locale;
+import org.immutables.value.Value;
 
+@Value.Immutable
 public interface ViewRepresentation {
 
-  enum Type {
-SQL;
+  class Type {
+private Type() {}
 
-public static Type fromString(String typeName) {
-  return valueOf(typeName.toUpperCase(Locale.ENGLISH));
-}
-
-public String typeName() {
-  return name().toLowerCase(Locale.ENGLISH);
-}
+public static final String SQL = "sql";

Review Comment:
   Yeah it seems like an established pattern elsewhere just to use a constant 
string and it simplifies the parsing logic a bit but I'm happy to revert back 
to enum if there's other advantages. Let me know your thoughts @rdblue  @jzhuge 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081672308


##
api/src/main/java/org/apache/iceberg/view/ViewRepresentation.java:
##
@@ -18,21 +18,16 @@
  */
 package org.apache.iceberg.view;
 
-import java.util.Locale;
+import org.immutables.value.Value;
 
+@Value.Immutable
 public interface ViewRepresentation {
 
-  enum Type {
-SQL;
+  class Type {
+private Type() {}
 
-public static Type fromString(String typeName) {
-  return valueOf(typeName.toUpperCase(Locale.ENGLISH));
-}
-
-public String typeName() {
-  return name().toLowerCase(Locale.ENGLISH);
-}
+public static final String SQL = "sql";

Review Comment:
   Yeah it seems like an established pattern elsewhere just to use a constant 
string and it simplifies the parsing logic a bit but I'm happy to revert back 
to enum if there's other advantages. Let me know your thoughts @rdblue  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6598: Core: View representation core implementation

2023-01-19 Thread GitBox


amogh-jahagirdar commented on code in PR #6598:
URL: https://github.com/apache/iceberg/pull/6598#discussion_r1081669003


##
core/src/test/java/org/apache/iceberg/view/TestViewRepresentationParser.java:
##
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import com.fasterxml.jackson.databind.JsonNode;
+import java.util.List;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.assertj.core.api.Assertions;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestViewRepresentationParser {

Review Comment:
   Sure updated! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on pull request #6626: Core: Update error msg

2023-01-19 Thread GitBox


nastra commented on PR #6626:
URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1397400315

   > If we are checking non null, I think the current error message still makes 
more sense?
   
   not sure, usually it's been called out on my own PRs to adjust the error msg 
to that particular format (hence the reason I mentioned it on the other PR), 
which is being used across other places in Iceberg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on pull request #6586: AWS: make warehouse path optional for read only catalog use cases

2023-01-19 Thread GitBox


jackye1995 commented on PR #6586:
URL: https://github.com/apache/iceberg/pull/6586#issuecomment-1397395046

   @aajisaka can you also take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on pull request #6626: Core: Update error msg

2023-01-19 Thread GitBox


jackye1995 commented on PR #6626:
URL: https://github.com/apache/iceberg/pull/6626#issuecomment-1397386343

   If we are checking non null, I think the current error message still makes 
more sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6074: API,Core: SnapshotManager to be created through Transaction

2023-01-19 Thread GitBox


nastra commented on code in PR #6074:
URL: https://github.com/apache/iceberg/pull/6074#discussion_r1081611444


##
api/src/main/java/org/apache/iceberg/Transaction.java:
##
@@ -155,6 +155,13 @@ default UpdateStatistics updateStatistics() {
*/
   ExpireSnapshots expireSnapshots();
 
+  /**
+   * Create a new {@link ManageSnapshots manage snapshot API} to manage 
snapshots in this table.
+   *
+   * @return a new {@link ManageSnapshots}
+   */
+  ManageSnapshots manageSnapshots();

Review Comment:
   RevAPI is checking API and ABI compatibility, so it's not just tricking 
RevAPI, it's also to make sure that users of this particular `Transaction` API 
don't run into API/ABI breakages when they upgrade the Iceberg version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



  1   2   3   4   5   6   7   8   9   10   >