date:20240402

Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-02 Thread via GitHub



sachet commented on code in PR #9782:
URL: https://github.com/apache/iceberg/pull/9782#discussion_r1548998658


##
core/src/main/java/org/apache/iceberg/rest/PaginatedList.java:
##
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ListIterator;
+import java.util.Map;
+import java.util.Spliterator;
+import java.util.Spliterators;
+import java.util.function.Supplier;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.rest.responses.ListNamespacesResponse;
+import org.apache.iceberg.rest.responses.ListTablesResponse;
+import org.apache.iceberg.rest.responses.Route;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class PaginatedList implements List {

Review Comment:
   +1, Current implementation of paginated list seems wrong as caller can get 
unexpected results. Two possible options are:
   1. Pre-fetch all items (as suggested above)
   2. methods like size(), contains(), isEmpty(), toArray(), get() .. first 
paginates across all items before performing the operation. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive: Fix metadata file not found [iceberg]

2024-04-02 Thread via GitHub



pvary commented on PR #10069:
URL: https://github.com/apache/iceberg/pull/10069#issuecomment-2033649191

   @lurnagao-dahua: If you check 
https://github.com/apache/iceberg/blob/3caa3a28d07a2d08b9a0e4196634126f1e016d6a/hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java,
 you can find plenty of examples for commit errors. Maybe if we could do 
something similar, like throwing an exception without a message. It would be 
nice to have a test.
   
   OTOH, if the test is more than 50 lines, it would cost us more in the upkeep 
of the test in the long run, than what we gain with testing a null check. In 
this case I would skip addig the extra code, following the example of #701.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] PyArrow S3FileSystem doesn't honor the AWS profile config [iceberg-python]

2024-04-02 Thread via GitHub

HonahX commented on issue #570:
URL: https://github.com/apache/iceberg-python/issues/570#issuecomment-2033588669

@geruh, thanks for highlighting this issue. The confusion largely stems from
the naming convention used when the `profile_name`, `region_name`,
`aws_access_key_id`, etc., were introduced in
[#7781](https://github.com/apache/iceberg/pull/7781). Initially, these
configurations were intended solely for GlueCatalog, but their generic names
suggest they might influence both Glue and S3 operations. To address this, we
can consider renaming these configurations with a `glue.` prefix (e.g.,
`glue.profile_name`) to clarify their scope. However, to maintain API
compatibility, we may need to support both the new and old naming conventions
temporarily.

> But on the other hand it seems reasonable that the AWS profile config
should work uniformly across both the catalog and filesystem levels.

+1 for unified configurations. I think it may be convenient to introduce
other unified configurations, with generic names like `aws-access-key-id`. So
the overall order of config will be:

1. Client-specific configs: glue.access-key-id, s3.access-key-id, etc.
3. Unified AWS configurations like aws-access-key-id
5. Environment variables and the default AWS config

> However, we're currently utilizing PyArrow's
[S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html#pyarrow.fs.S3FileSystem),
which doesn't inherently support AWS profiles. This means we'd need to bridge
that gap manually.

Regarding the `profile_name` support for PyArrow's S3FileSystem, it seems
there might not be a direct solution from the pyiceberg side. This
functionality appears to be more suitably addressed through enhancements to the
PyArrow library itself. WDYT?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

1 2 >

1 - 100 of 115 matches

Mail list logo