date:20240405

[I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]

2024-04-05 Thread via GitHub

HonahX opened a new issue, #588:
URL: https://github.com/apache/iceberg-python/issues/588

### Question

Currently, the HiveCatalog's `_commit_table` workflow looks like:

1. load current table metadata via `load_table`
2. construct updated metadata
3. lock the hive table
4. alter the hive table
5. unlock the hive table

Suppose now there are 2 process, A and B try to commit some changes to the
same iceberg table It is possible that the code execution happens to be in the
following order:

1. process A load current table metadata
2. process A construct updated metadata
3. process B starts and finishes the **whole** `_commit_table`
4. process A lock the hive table
5. process A alter the hive table
6. process A unlock the hive table

In this specific scenario, both processes successfully commit their changes
because process B releases the lock before A tries to acquire. But if the
`alter_table` does not support [transactional
check](https://issues.apache.org/jira/browse/HIVE-26882), the changes made by
process B will be overridden.

Since in python we do not know which Hive version we are connecting to, I
wonder if we need to update the code to lock the table before loading current
table metadata, like what [Java
implementation](https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L184)
does.

BTW, it seems there are some consistency issue of
https://issues.apache.org/jira/browse/HIVE-26882 as well and there is an open
fix for that https://github.com/apache/hive/pull/5129

Please correct me if I misunderstand something here. Thanks!

cc: @Fokko

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Suppress duplicate OAuth token fetching in rest catalog client [iceberg-python]

2024-04-05 Thread via GitHub



TennyZhuang opened a new issue, #587:
URL: https://github.com/apache/iceberg-python/issues/587

   ### Feature Request / Improvement
   
   In the rest catalog client, we implemented the OAuth token refresh based on 
retry mechanism.
   
   
https://github.com/apache/iceberg-python/blob/4148edb5e28ae88024a55e0b112238e65b873957/pyiceberg/catalog/rest.py#L126-L128
   
   If there are concurrent calls to the rest client API, they may failed at the 
same time, then thousands of `_refresh_token` may be called. The behavior is 
likely acceptable with no exceptions, but it appears wasteful and not as 
expected.
   
   A potential solution is introducing a `threading.Lock` to protect the 
fetching process. Every call to `_refresh_token` should acquire the `Lock` 
first, and check whether the token is same as the expired token.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] SparkSessionCatalog with JDBC catalog: SHOW TABLES IN ... returns error but table exists in JDBC catalog [iceberg]

2024-04-05 Thread via GitHub



matepek commented on issue #10003:
URL: https://github.com/apache/iceberg/issues/10003#issuecomment-2040974202

   Thank you for the answers.
   
   > fundamentally that issue is the same
   
   Yes I think I understand that. What I'm surprised that why for that table 
creation call stack I don't se the SparkSessionCatalog anywhere. I mean How 
could the SSC handle this case when it does not have control over it. I'm 
missing an important piece of the puzzle here, I feel.
   
   > Namespace has to be created explicitly in Nessie as described
   
   Could the SparkSessionCatalog create it on demand? If not then I don't see 
how it could be used with nessie properly.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]

2024-04-05 Thread via GitHub



HonahX commented on code in PR #585:
URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554526443


##
pyiceberg/catalog/hive.py:
##
@@ -103,22 +104,6 @@
 import pyarrow as pa
 
 
-# Replace by visitor
-hive_types = {

Review Comment:
   This is not used so I remove it. :blush:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]

2024-04-05 Thread via GitHub



HonahX commented on code in PR #585:
URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554526443


##
pyiceberg/catalog/hive.py:
##
@@ -103,22 +104,6 @@
 import pyarrow as pa
 
 
-# Replace by visitor
-hive_types = {

Review Comment:
   This is not used so I remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]

2024-04-05 Thread via GitHub



HonahX commented on code in PR #585:
URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554526443


##
pyiceberg/catalog/hive.py:
##
@@ -103,22 +104,6 @@
 import pyarrow as pa
 
 
-# Replace by visitor
-hive_types = {

Review Comment:
   This is not used so I remove it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] [0.6.x] Backport #585 [iceberg-python]

2024-04-05 Thread via GitHub



HonahX opened a new pull request, #586:
URL: https://github.com/apache/iceberg-python/pull/586

   Backport #585 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]

2024-04-05 Thread via GitHub



HonahX commented on code in PR #585:
URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554524560


##
pyiceberg/catalog/hive.py:
##
@@ -199,6 +184,7 @@ def _annotate_namespace(database: HiveDatabase, properties: 
Properties) -> HiveD
 DateType: "date",
 TimeType: "string",
 TimestampType: "timestamp",
+TimestamptzType: "timestamp",

Review Comment:
   In Java, we map `TimestamptzType` to `"timestamp with local time zone"` for 
Hive version >= 3 `TimestamptzType`. 
   ```java
   case TIMESTAMP:
   Types.TimestampType timestampType = (Types.TimestampType) type;
   if (HiveVersion.min(HiveVersion.HIVE_3) && 
timestampType.shouldAdjustToUTC()) {
 return "timestamp with local time zone";
   }
   return "timestamp";
   ```
   But in python it seems we do not have a way to check the version. So I put 
the "timestamp" here to ensure compatibility. Please correct me if I am wrong 
here. Thanks!
   
   
   ref: 
https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java#L143



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on PR #324:
URL: https://github.com/apache/iceberg-rust/pull/324#issuecomment-2040961957

   Thanks @marvinlanhenke This is amazing, I'll take a review later!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on PR #324:
URL: https://github.com/apache/iceberg-rust/pull/324#issuecomment-2040961635

   @liurenjie1024 @ZENOTME @Fokko 
   PTAL and let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554520971


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,119 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Datum, Literal, PrimitiveType, Struct};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Box),
+Type(PrimitiveType),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {
+position: usize,
+r#type: PrimitiveType,
+inner: Option>,
+}
+
+pub(crate) type StructAccessorRef = Arc;
+
+impl StructAccessor {
+pub(crate) fn new(position: usize, r#type: PrimitiveType) -> Self {
+StructAccessor {
+position,
+r#type,
+inner: None,
+}
+}
+
+pub(crate) fn wrap(position: usize, inner: Box) -> Self {
+StructAccessor {
+position,
+r#type: inner.r#type().clone(),
+inner: Some(inner),
+}
+}
+
+pub(crate) fn position(&self) -> usize {
+self.position
+}
+
+pub(crate) fn r#type(&self) -> &PrimitiveType {
+&self.r#type
+}
+
+pub(crate) fn get<'a>(&'a self, container: &'a Struct) -> Datum {
+match &self.inner {
+None => {
+let Literal::Primitive(literal) = &container[self.position] 
else {
+panic!("Expected Literal to be Primitive");
+};
+
+Datum::new(self.r#type().clone(), literal.clone())
+}
+Some(inner) => {
+let Literal::Struct(wrapped) = &container[self.position] else {
+panic!("Nested accessor should only be wrapping a Struct");

Review Comment:
   Ditto.



##
crates/iceberg/src/spec/schema.rs:
##
@@ -137,9 +142,55 @@ impl SchemaBuilder {
 name_to_id,
 lowercase_name_to_id,
 id_to_name,
+
+field_id_to_accessor,
 })
 }
 
+fn build_accessors(&self) -> HashMap> {

Review Comment:
   > Will do, re SchemaVisitor. Sorry Renjie, not sure what you mean with the 
second comment though?
   
   Sorry, I mean adding some unit test and example code in api to demonstrate 
its usage, like what python does here: 
https://github.com/apache/iceberg-python/blob/d3db8401cd006fcbacf1d9a2502248ce4228ff06/pyiceberg/schema.py#L1162



##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,119 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Datum, Literal, PrimitiveType, Struct};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {

Review Comment:
   Should we remove this?



##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://ww

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554519498


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   > I also like the idea of going further and having an Expression enum that 
contains the logical operators plus Predicate above, with Reference from the 
above being an enum over bound or unbound. This allows visitors to potentially 
work over both bound and unbound expressions.
   
   Hi, @sdd Would you mind to open an issue to raise a discussion about this 
idea? I think it looks promsing to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554519334


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   > The proposed function signature fn visitor_op(&mut self, op: 
&PredicateOperator, reference: &BoundReference) would work for a unary operator.
   
   I think following signature would be applied to all operators:
   ```rust
fn visitor_op(&mut self, op: &PredicateOperator, reference: 
&BoundReference, literals: &[Datum])
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

2024-04-05 Thread via GitHub



gwindes commented on issue #584:
URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2040825581

   Further research shows that when I use 
[daft](https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/iceberg.html#reading-a-table)
 that I'm able to read and use the `to_arrow()` functionality just fine.
   
   Basic script:
   ```
   from pyiceberg.catalog import load_catalog
   import daft
   
   print(f"Daft version: {daft.__version__}")
   
   warehouse_path = "./warehouse"
   
   catalog = load_catalog(**{
   "uri": f"sqlite:///{warehouse_path}/pyiceberg-catalog.db",
   "warehouse": f"file://{warehouse_path}"
   })
   table = catalog.load_table("A1B2.A1-301")
   
   df = daft.read_iceberg(table)
   
   print(df)
   print(df.to_arrow())
   ```
   
   Resulting successful output:
   ```
   Daft version: 0.2.20
   
╭──┬──┬──┬───╮
   │ TEST:A1B2.RAW.ABC-GG-1-A ┆ TEST:A1B2.RAW.ABC-GG-1-B ┆ 
TEST:A1B2.RAW.ABC-GG-1-C ┆ time  │
   │ ---  ┆ ---  ┆ ---  
┆ ---   │
   │ Float64  ┆ Float64  ┆ Float64  
┆ Int64 │
   
╰──┴──┴──┴───╯
   
   pyarrow.Table
   TEST:A1B2.RAW.ABC-GG-1-A: double
   TEST:A1B2.RAW.ABC-GG-1-B: double
   TEST:A1B2.RAW.ABC-GG-1-C: double
   time: int64
   
   TEST:A1B2.RAW.ABC-GG-1-A: [[0,1,2,3,4,5,6,7,8,9]]
   TEST:A1B2.RAW.ABC-GG-1-B: [[0,1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9]]
   TEST:A1B2.RAW.ABC-GG-1-C: [[0,1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9]]
   time: 
[[1702090722998897808,1702090722998947809,1702090722998997809,1702090722999047809,1702090722999097809,1702090722999147809,1702090722999197809,1702090722999247809,1702090722999297809,1702090722999347809]]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Class not found error when use sink [iceberg]

2024-04-05 Thread via GitHub



github-actions[bot] commented on issue #2455:
URL: https://github.com/apache/iceberg/issues/2455#issuecomment-2040804841

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Build fails due to inaccessible palantir dependencies [iceberg]

2024-04-05 Thread via GitHub



github-actions[bot] commented on issue #2462:
URL: https://github.com/apache/iceberg/issues/2462#issuecomment-2040804859

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Move site-related things to a separate repo [iceberg]

2024-04-05 Thread via GitHub



github-actions[bot] commented on issue #2446:
URL: https://github.com/apache/iceberg/issues/2446#issuecomment-2040804831

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554441701


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   I also like the idea of going further and having an `Expression` enum that 
contains the logical operators plus `Predicate` above, with `Reference` from 
the above being an enum over bound or unbound. This allows visitors to 
potentially work over both bound and unbound expressions.
   
   ```rust
   pub enum Expression {
   // The top level would be Expression rather than Predicate.
   // The logical operators would live at this level.
   
   // Since logical operators such as And / Or / Not
   // are common between bound and unbound, some visitors
   // such as rewrite_not can be applied to both bound and
   // unbound expressions, preventing duplication and increasing
   // flexibility
   
   /// An expression always evaluates to true.
   AlwaysTrue,
   
   /// An expression always evaluates to false.
   AlwaysFalse,
   
   // Here we use a struct variant to eliminate the need
   // for a generic `LogicalExpression`, which did not
   // add much, also allowing us to refer to lhs and rhs
   // by name
   
   /// An expression combined by `AND`, for example, `a > 10 AND b < 20`.
   And {
   lhs: Box,
   rhs: Box
   },
   
   /// An expression combined by `OR`, for example, `a > 10 OR b < 20`.
   Or {
   lhs: Box,
   rhs: Box
   },
   
   /// An expression combined by `NOT`, for example, `NOT (a > 10)`.
   Not(Box),
   
   /// A Predicate
   Predicate(Predicate),
   }
   
   pub enum Reference {
   // Only at the leaf level do we distinguish between
   // bound and unbound. This allows things like bind
   // and unbind to be implemented as a transforming visitor,
   // and avoids duplication of logic compared to having
   // Bound / Unbound being distinguished at a higher level
   // in the type hierarchy
   
   Bound(BoundReference),
   Unbound(UnboundReference),
   }
   
   pub struct UnboundReference {
   name: String,
   }
   
   pub struct BoundReference {
   column_name: String,
   field: NestedFieldRef,
   accessor: StructAccessor,
   }
   ``` 
   
   We could use type alises such as `type BoundExpression = Expression` and 
have `bind` return a `BoundExpression` to prevent bound expressions being used 
where unbound ones are required and vice versa. But I'm digressing very far 
from the original comment now, sorry! 😆 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554438555


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   I don't see how this will work with just `visitor_op`.
   
   The proposed function signature `fn visitor_op(&mut self, op: 
&PredicateOperator, reference: &BoundReference)` would work for a unary 
operator.
   
   A binary operator would need `fn visitor_op(&mut self, op: 
&PredicateOperator, reference: &BoundReference, literal: &Datum)`, and a set 
operator would need `fn visitor_op(&mut self, op: &PredicateOperator, 
reference: &BoundReference, literals: &FnvHashSet)`.
   
   To resolve this, you'd need the visitor to have `fn unary_op(...)`, `fn 
binary_op(...)`, and `fn set_op(...)`.
   
   Is that what we want? 
   
   Personally I think this is another example of where the current type 
hierarchy feels awkward - there are other places where we bump into these type 
hierarchy mismatches, such as needing a default leg in a match on operator to 
handle operators that will never show up, such as binary ops in a unary 
expression.
   
   Something more like this feels better to me:
   
   ```rust
   pub enum Predicate {
   // By not having Unary / Binary / Set here, and removing
   // that layer in favour of having the ops themselves
   // as part of this enum, there is no need for code that
   // raises errors when the wrong type of `Op` is used, since those
   // issues are prevented by the type system at compile time
   
   // Unary ops now only contain one param, the term, and so can be
   // implemented more simply
   IsNull(Reference),
   // ... other Unary ops here,
   
   // Binary / Set ops ditch the `BinaryExpression`, with the literal
   // and reference being available directly
   GreaterThan {
   literal: Datum,
   reference: Reference,
   },
   // ... other Binary ops here,
   
   In {
   literals: HashSet,
   reference: Reference,
   },
   // ... other Set ops here
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554425354


##
crates/iceberg/src/spec/schema.rs:
##
@@ -137,9 +142,55 @@ impl SchemaBuilder {
 name_to_id,
 lowercase_name_to_id,
 id_to_name,
+
+field_id_to_accessor,
 })
 }
 
+fn build_accessors(&self) -> HashMap> {

Review Comment:
   `SchemaVisitor` won't work for this without some modifications. We need to 
know the index within `Schema.fields` for each field to create its accessor, 
and `visit_struct` does not provide this at the moment.
   
   `visit_struct` and `SchemaVisitor` could be modified to pass the field index 
to the visitor's methods. Can we keep this PR as-is and have a TODO so that we 
can move on with this working as-is and I or someone else can pick up this 
refactor in another PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Introduce two properties for reading the connection timeout and socke… [iceberg]

2024-04-05 Thread via GitHub



harishch1998 commented on code in PR #10053:
URL: https://github.com/apache/iceberg/pull/10053#discussion_r1544095514


##
core/src/main/java/org/apache/iceberg/rest/HTTPClient.java:
##
@@ -78,6 +80,8 @@ public class HTTPClient implements RESTClient {
   private static final int REST_MAX_CONNECTIONS_DEFAULT = 100;
   private static final String REST_MAX_CONNECTIONS_PER_ROUTE = 
"rest.client.connections-per-route";
   private static final int REST_MAX_CONNECTIONS_PER_ROUTE_DEFAULT = 100;
+  @VisibleForTesting static final String REST_CONNECTION_TIMEOUT = 
"rest.client.connection-timeout";
+  @VisibleForTesting static final String REST_SOCKET_TIMEOUT = 
"rest.client.socket-timeout";

Review Comment:
   Yeah, these are milliseconds. I'll fix the naming.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Introduce two properties for reading the connection timeout and socke… [iceberg]

2024-04-05 Thread via GitHub



harishch1998 commented on code in PR #10053:
URL: https://github.com/apache/iceberg/pull/10053#discussion_r1554368145


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java:
##
@@ -133,6 +136,59 @@ public void testDynamicHttpRequestInterceptorLoading() {
 assertThat(((TestHttpRequestInterceptor) 
interceptor).properties).isEqualTo(properties);
   }
 
+  @Test
+  public void testHttpClientGetConnectionConfig() {
+long connectionTimeoutMs = 10L;
+int socketTimeoutMs = 10;
+Map properties =
+ImmutableMap.of(
+HTTPClient.REST_CONNECTION_TIMEOUT_MS, 
String.valueOf(connectionTimeoutMs),
+HTTPClient.REST_SOCKET_TIMEOUT_MS, 
String.valueOf(socketTimeoutMs));
+
+ConnectionConfig connectionConfig = 
HTTPClient.configureConnectionConfig(properties);
+assertThat(connectionConfig).isNotNull();
+
assertThat(connectionConfig.getConnectTimeout().getDuration()).isEqualTo(connectionTimeoutMs);
+
assertThat(connectionConfig.getSocketTimeout().getDuration()).isEqualTo(socketTimeoutMs);
+  }
+
+  @Test
+  public void testHttpClientWithSocketTimeout() throws IOException {

Review Comment:
   I tried multiple solutions but couldn't specify a custom connection timeout 
on the mockServer. It seems like a network-level configuration and is 
non-trivial to unit test with the existing MockServer that we use in testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Introduce two properties for reading the connection timeout and socke… [iceberg]

2024-04-05 Thread via GitHub



harishch1998 commented on code in PR #10053:
URL: https://github.com/apache/iceberg/pull/10053#discussion_r1554368145


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java:
##
@@ -133,6 +136,59 @@ public void testDynamicHttpRequestInterceptorLoading() {
 assertThat(((TestHttpRequestInterceptor) 
interceptor).properties).isEqualTo(properties);
   }
 
+  @Test
+  public void testHttpClientGetConnectionConfig() {
+long connectionTimeoutMs = 10L;
+int socketTimeoutMs = 10;
+Map properties =
+ImmutableMap.of(
+HTTPClient.REST_CONNECTION_TIMEOUT_MS, 
String.valueOf(connectionTimeoutMs),
+HTTPClient.REST_SOCKET_TIMEOUT_MS, 
String.valueOf(socketTimeoutMs));
+
+ConnectionConfig connectionConfig = 
HTTPClient.configureConnectionConfig(properties);
+assertThat(connectionConfig).isNotNull();
+
assertThat(connectionConfig.getConnectTimeout().getDuration()).isEqualTo(connectionTimeoutMs);
+
assertThat(connectionConfig.getSocketTimeout().getDuration()).isEqualTo(socketTimeoutMs);
+  }
+
+  @Test
+  public void testHttpClientWithSocketTimeout() throws IOException {

Review Comment:
   I tried multiple solutions but couldn't specify a custom connection timeout 
on the mockServer. It seems like a network-level configuration and is 
non-trivial to unit test the behavior with the existing MockServer that we use 
in testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

2024-04-05 Thread via GitHub



gwindes opened a new issue, #584:
URL: https://github.com/apache/iceberg-python/issues/584

   ### Apache Iceberg version
   
   0.6.0 (latest release)
   
   ### Please describe the bug 🐞
   
   Related to: https://github.com/apache/iceberg-python/issues/81
   
   Python `v3.12`
   
   ```
   pyiceberg 0.6.0
   pyarrow 15.0.2
   pandas 2.2.1
   ```
   
   I believe this is a bug, but I may also be misunderstanding how pyiceberg 
and pyarrow are working with iceberg tables and thus I may be doing something 
wrong. However, when I sanitize the column name before writing the data to 
remove `: . - /` I'm able to query just fine.
   
   My understanding is that the following iceberg column name is a valid name 
`TEST:A1B2.RAW.ABC-GG-1-A`. With the caveat that it is NOT a nested field 
(which I don't need). I'm able to write the data to the iceberg table and it 
shows the metadata with the fully qualified name of `TEST:A1B2.RAW.ABC-GG-1-A` 
in the metadata json.
   
   It appears to be only fail when I want to read the data. I'm following the 
basic [getting started in the 
pyiceberg](https://py.iceberg.apache.org/#write-a-pyarrow-dataframe)
   ```
   table = catalog.load_table("A1B2.A1-301")
   
   # neither table scan works (throws the same error):
   df_pyarrow = table.scan().to_arrow()
   df_panda = table.scan().to_pandas()
   ```
   
   I created a [sample 
project](https://github.com/gwindes/pyiceberg-arrow-bug/blob/main/column_name_test.py)
 that reproduces my problem with the `pyarrow.lib.ArrowInvalid: No match for 
FieldRef.Name` error.
   
   Image showing metadata is storing channel name as expected.
   https://github.com/apache/iceberg-python/assets/843453/9d0f656a-5bf3-45f8-a4c3-5c39fbc8afa0";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-05 Thread via GitHub



harishch1998 commented on code in PR #10052:
URL: https://github.com/apache/iceberg/pull/10052#discussion_r1554343246


##
core/src/main/java/org/apache/iceberg/rest/HTTPClient.java:
##
@@ -468,6 +483,19 @@ public Builder uri(String path) {
   return this;
 }
 
+public Builder withProxy(String hostname, int port) {
+  Preconditions.checkNotNull(hostname, "Invalid hostname for http client 
proxy: null");
+  this.proxy = new HttpHost(hostname, port);
+  return this;
+}
+
+public Builder withProxyCredentialsProvider(CredentialsProvider 
credentialsProvider) {
+  Preconditions.checkNotNull(
+  credentialsProvider, "Invalid credentials provider for http client 
proxy: null");
+  this.credentialsProvider = credentialsProvider;

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-05 Thread via GitHub



harishch1998 commented on code in PR #10052:
URL: https://github.com/apache/iceberg/pull/10052#discussion_r1554325691


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java:
##
@@ -121,6 +128,92 @@ public void testHeadFailure() throws 
JsonProcessingException {
 testHttpMethodOnFailure(HttpMethod.HEAD);
   }
 
+  /** Tests that requests go via the proxy server in case the client is set up 
with one */
+  @Test
+  public void testHttpClientProxyServerInteraction() throws IOException {
+int proxyPort = 1070;
+String proxyHostName = "localhost";
+try (ClientAndServer proxyServer = startClientAndServer(proxyPort);
+RESTClient clientWithProxy =
+HTTPClient.builder(ImmutableMap.of())
+.uri(URI)
+.withProxy(proxyHostName, proxyPort)
+.build()) {
+  //  Set up the servers to match against a provided request
+  String path = "v1/config";
+
+  HttpRequest mockRequest =
+  request("/" + 
path).withMethod(HttpMethod.HEAD.name().toUpperCase(Locale.ROOT));
+
+  HttpResponse mockResponse = response().withStatusCode(200);
+
+  mockServer.when(mockRequest).respond(mockResponse);
+  proxyServer.when(mockRequest).respond(mockResponse);
+
+  restClient.head(path, ImmutableMap.of(), (onError) -> {});
+  mockServer.verify(mockRequest, VerificationTimes.exactly(1));
+  proxyServer.verify(mockRequest, VerificationTimes.never());

Review Comment:
   Makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-05 Thread via GitHub



harishch1998 commented on code in PR #10052:
URL: https://github.com/apache/iceberg/pull/10052#discussion_r1554322900


##
core/src/main/java/org/apache/iceberg/rest/HTTPClient.java:
##
@@ -468,6 +483,19 @@ public Builder uri(String path) {
   return this;
 }
 
+public Builder withProxy(String hostname, int port) {
+  Preconditions.checkNotNull(hostname, "Invalid hostname for http client 
proxy: null");
+  this.proxy = new HttpHost(hostname, port);
+  return this;
+}
+
+public Builder withProxyCredentialsProvider(CredentialsProvider 
credentialsProvider) {
+  Preconditions.checkNotNull(
+  credentialsProvider, "Invalid credentials provider for http client 
proxy: null");
+  this.credentialsProvider = credentialsProvider;

Review Comment:
   So I also thought of failing the build in such cases but another option was 
to simply end up in a no-op if a proxyCredentialsProvider is specified but not 
the proxyServer. I ended up doing the latter in the private constructor. 
   
   ```
   if (proxy != null) {
 if (credsProvider != null) {
   clientBuilder.setDefaultCredentialsProvider(credsProvider);
 }
   
 clientBuilder.setProxy(proxy);
   }
   ```
   
   But I'm happy to add an additional notNull check. Will fix the naming. 
Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Hive Catalog cannot create table with TimestamptzType field [iceberg-python]

2024-04-05 Thread via GitHub



HonahX opened a new issue, #583:
URL: https://github.com/apache/iceberg-python/issues/583

   ### Apache Iceberg version
   
   0.6.0 (latest release)
   
   ### Please describe the bug 🐞
   
   This bug was reported by Jorge Arada on slack: 
https://apache-iceberg.slack.com/archives/C029EE6HQ5D/p1712309474600169
   
   Reported example to reproduce:
   ```python
   catalog = load_hive("iceberg_catalog", {"uri": "some_url"})
fields = [NestedField( field_id=0, name="teste_tz", 
field_type=TimestamptzType(), required=False)]
   schema = Schema(*fields)
   catalog.create_table(identifier="raw.test_table", schema=schema)
   ```
   Error: 
   ```
   Traceback (most recent call last):
 File "", line 1, in 
 File 
"/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py",
 line 315, in create_table
   sd=_construct_hive_storage_descriptor(schema, location),
  
 File 
"/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py",
 line 155, in _construct_hive_storage_descriptor
   [FieldSchema(field.name, visit(field.field_type, 
SchemaToHiveConverter()), field.doc) for field in schema.fields],
   
^
 File 
"/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py",
 line 155, in 
   [FieldSchema(field.name, visit(field.field_type, 
SchemaToHiveConverter()), field.doc) for field in schema.fields],

 File "/usr/local/lib/python3.11/functools.py", line 909, in wrapper
   return dispatch(args[0].__class__)(*args, **kw)
  
 File 
"/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/schema.py", 
line 847, in _
   return visitor.primitive(obj)
  ^^
 File 
"/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py",
 line 228, in primitive
   return HIVE_PRIMITIVE_TYPES[type(primitive)]
  ^
   KeyError: 
   ```
   
   It seems we miss the TimestampzTypein the [Iceberg to Hive type conversion 
map](https://github.com/apache/iceberg-python/blob/a892309936effa7ec575195ad3be70193e82d704/pyiceberg/catalog/hive.py#L193).
   A similar issue has been observed and fixed in glue: 
https://github.com/apache/iceberg-python/pull/366
   
   Based on discussion, it would be good to include the fix in 0.6.1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Validate overwrite filter [iceberg-python]

2024-04-05 Thread via GitHub



jqin61 opened a new pull request, #582:
URL: https://github.com/apache/iceberg-python/pull/582

   **Background** 
   When we are doing a static overwrite, we could choose to overwrite the full 
table or overwrite some partitions of the table. 
   
   Example spark sql counterpart in [iceberg spark static 
overwrite](https://iceberg.apache.org/docs/latest/spark-writes/#static-overwrite)
 for full table overwrite is 
   ```
   INSERT OVERWRITE prod.my_app.logs
   SELECT uuid, first(level), first(ts), first(message)
   FROM prod.my_app.logs
   WHERE level = 'INFO'
   GROUP BY uuid
   ```
   And example spark sql counterpart  for specified partition overwrite  is
   ```
   INSERT OVERWRITE prod.my_app.logs
   SELECT uuid, first(level), first(ts), first(message)
   PARTITION (level = 'INFO')
   FROM prod.my_app.logs
   WHERE level = 'INFO'
   GROUP BY uuid
   ```
   **Goal**
   When we overwrite the table, we could provide an expression as 
overwrite_filter in accord with these 2 cases. This pr is to validate that the 
filter expression should conform to certain rules such as it has to be on a 
partition column that does not use hidden partitioning and the fields in the 
filter have to be in accord with the input arrow table in a certain way so that 
the dataframe does not include values that the filter does not specify. 
   
   **Rules and Test Cases**
   1. **Rule :** The expression could only use **IsNull** or **EqualTo** as 
building blocks and concatenated by **And**
**Tests :** `test__validate_static_overwrite_filter_expr_type` 
parametrize 1-8
   
   2. **Rule :** The building block predicates (IsNull and EqualTo) should not 
have conflicting values.
**Tests :** `test__validate_static_overwrite_filter_expr_type` 
parametrize 9-11
   
   3. **Rule :** The terms (fields) should refer to existing fields in the 
iceberg schema, and also the literal in the predicate (if any) should match the 
iceberg field type. These mean the expression could be bound with table schema 
successfully.
**Tests :** 
   
`test__bind_and_validate_static_overwrite_filter_predicate_fails_on_non_schema_fields_in_filter`
   
`test__bind_and_validate_static_overwrite_filter_predicate_fails_to_bind_due_to_incompatible_predicate_value`
   
   4. **Rule :** If expression specifies a field which is required in iceberg 
schema, it should not be isnull in the expression.
**Tests :** 
`test__bind_and_validate_static_overwrite_filter_predicate_fails_to_bind_due_to_non_nullable`
   
   5. **Rule :** The fields in the expression should be within partition columns
**Tests :** 
`test__bind_and_validate_static_overwrite_filter_predicate_fails_on_non_part_fields_in_filter`
   
   6. **Rule :** The iceberg table fields specified in the expression could not 
have hidden partitioning, however, the non-specified fields could.
**Tests :** 
   
`test__bind_and_validate_static_overwrite_filter_predicate_fails_on_non_identity_transorm_filter`
  
   
`test__bind_and_validate_static_overwrite_filter_predicate_succeeds_on_an_identity_transform_field_although_table_has_other_hidden_partition_fields`
   
   7. **Rule :** The partition column values in the dataframe should conform to 
the filter. (To implement in the static overwrite function when partion keys 
are extracted)
   
   **Rule Necessity Justification using Spark Counterparts**
   To better understand these rules, let us provide spark static overwrite 
crash counterparts. For which, we have following set up:
   ```
   # Create Spark Dataframe
   from pyspark.sql.types import StructType, StructField, StringType, LongType
   data_multicols = [(2, "Flamingo", "red"), (4, "Horse", "white"), (4, "Pig", 
"pink")]
   schema = StructType([
   StructField("n_legs", LongType(), nullable=True),
   StructField("animals", StringType(), nullable=True),
   StructField("color", StringType(), nullable=True)  # Mark as non-nullable
   ])
   df_multicols = spark.createDataFrame(data_multicols, schema)
   
   # Create Iceberg Table
   create_sql = """CREATE TABLE 
lacus.test.spark_staticoverwrite_partition_clause_and_data_reltship_multicols ( 
   n_legs bigint, 
   animals string,
   color string) 
   USING iceberg
   PARTITIONED BY (n_legs, color)
   
   """
   spark.sql(create_sql)
   
   # Insert Initial data
   df_multicols.createOrReplaceTempView("tmp_view")
   sql_cmd = f"""INSERT INTO
   
lacus.test.spark_staticoverwrite_partition_clause_and_data_reltship_multicols
   SELECT * FROM  tmp_view
   """
   spark.sql(sql_cmd)
   ```
   
   this gives such table schema:
   
   |  col_name|data_type|comment|
   |--|-|---|
   |n_legs|   bigint|   |
   |   animals|   string|   |
   | color|   string|   |
   
   | Partitioning| |   |
   |--|-|---|
   |Part 0|   n_legs|   |
   |Part 1|color|   |
   
   with such data:

Re: [I] Support client-side purge in REST catalog [iceberg]

2024-04-05 Thread via GitHub



flyrain commented on issue #10089:
URL: https://github.com/apache/iceberg/issues/10089#issuecomment-2040599592

   That's right. In our case, the rest server cannot access all table files due 
to following reasons:
   1. The rest catalog or any other catalog isn't allowed to access users' data 
due to the security policy, metadata access is fine.
   2. Some Iceberg tables are in HDFS with kerberos, which makes them pretty 
hard to access from a centralized server.
   
   We still write metadata.json files, but they are located in a server-side 
storage instead of users' table storage. I understand this use case is a bit 
different from where the REST catalog was introduced, but I believe it is a 
valid use case, and we extend the scope of rest catalog a bit more to support 
it. cc @RussellSpitzer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support client-side purge in REST catalog [iceberg]

2024-04-05 Thread via GitHub



danielcweeks commented on issue #10089:
URL: https://github.com/apache/iceberg/issues/10089#issuecomment-2040535320

   @flyrain I'm a little confused, how can the REST Server not have access to 
the files?  Currently the server needs access to at least the metadata files.  
Are you considering a situations where data files and metadata files are 
protected separately?
   
   The way we've been thinking about REST puts the responsibility of the delete 
on the server (the client shouldn't be responsible for how or when the delete 
happens).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]

2024-04-05 Thread via GitHub



singhpk234 commented on issue #10064:
URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040485782

   seems like we always get default config setting in presto 
   
   
https://github.com/prestodb/presto/blob/c5674d6e7c1008eb7935542a836f8f90d6bf09c6/presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergParquetFileWriter.java#L74C1-L74C194
   
   even for deletes we need to have special handling here i think :), if i am 
not mistaken it would be interesting to see if it causes regression for already 
released version of presto which are being used as a writer it would just end 
up reading more data i guess , not sure though 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554175122


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),
+Type(Type),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {
+position: i32,
+inner_or_type: InnerOrType,
+}
+
+pub(crate) type StructAccessorRef = Arc;
+
+impl StructAccessor {
+pub(crate) fn new(position: i32, r#type: Type) -> Self {
+StructAccessor {
+position,
+inner_or_type: InnerOrType::Type(r#type),
+}
+}
+
+pub(crate) fn wrap(position: i32, inner: StructAccessorRef) -> Self {
+StructAccessor {
+position,
+inner_or_type: InnerOrType::Inner(inner),
+}
+}
+
+pub fn position(&self) -> i32 {
+self.position
+}
+
+fn r#type(&self) -> &Type {
+match &self.inner_or_type {
+InnerOrType::Inner(inner) => inner.r#type(),
+InnerOrType::Type(r#type) => r#type,
+}
+}
+
+fn get<'a>(&'a self, container: &'a Struct) -> &Literal {

Review Comment:
   Ok, will update



##
crates/iceberg/src/spec/values.rs:
##
@@ -1141,6 +1141,11 @@ impl Struct {
 },
 )
 }
+
+/// Gets a ref to the field at `position` within the `Struct`
+pub fn get(&self, position: i32) -> &Literal {

Review Comment:
   Good idea! Will do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554176015


##
crates/iceberg/src/spec/schema.rs:
##
@@ -137,9 +142,55 @@ impl SchemaBuilder {
 name_to_id,
 lowercase_name_to_id,
 id_to_name,
+
+field_id_to_accessor,
 })
 }
 
+fn build_accessors(&self) -> HashMap> {

Review Comment:
   Will do, re SchemaVisitor. Sorry Renjie, not sure what you mean with the 
second comment though?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554174651


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),

Review Comment:
   Wil do!



##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),

Review Comment:
   Will do!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]

2024-04-05 Thread via GitHub



singhpk234 commented on issue #10064:
URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040468114

   are you using presto as a writer as well ? if you check it's still 16 
characters so the writer which is generating the delete files is generating 
wrong metrics i.e still using truncate(16).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554171392


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),
+Type(Type),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {

Review Comment:
   This was deliberate - in the Java implementation, `Accessor` is an interface 
and generic over `T`, with T usually being `StructLike`, and `StructLike` is 
itself an interface. Since we only have `Struct` at the moment , rather than 
several structs that implement a `StructLike` trait, I went for 
`StructAccessor`, since ours is closer to `Accessor` in the Java 
implementation, and this allows us to implement a more generic `Accessor` at 
some point in the future alongside `StructAccessor`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]

2024-04-05 Thread via GitHub



agrawalreetika commented on issue #10064:
URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040458075

   Thank you for your response @singhpk234 
   So we are using iceberg version 1.4.3 in Presto.
   And I checked https://github.com/apache/iceberg/pull/6313 looks like it's 
fixed for Flink, so it wouldn't work for others?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554163204


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),
+Type(Type),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {
+position: i32,
+inner_or_type: InnerOrType,

Review Comment:
   That duplicates the type at each level, but you have a point, we're 
optimising for time efficiency rather than space. Will fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554161905


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),
+Type(Type),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {
+position: i32,

Review Comment:
   Sure, will change. I saw we'd used `i32` almost everywhere else so went with 
that but `usize` works better 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] zorder does not work with sub fields [iceberg]

2024-04-05 Thread via GitHub



cccs-jc commented on issue #10017:
URL: https://github.com/apache/iceberg/issues/10017#issuecomment-2040368228

   Seems like it would. I'm not a reviewer but I do want to the fix :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add explicit JSON parser for ConfigResponse [iceberg]

2024-04-05 Thread via GitHub



nastra merged PR #9952:
URL: https://github.com/apache/iceberg/pull/9952


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add explicit JSON parser for ConfigResponse [iceberg]

2024-04-05 Thread via GitHub



nastra commented on PR #9952:
URL: https://github.com/apache/iceberg/pull/9952#issuecomment-2040195604

   thanks for the review @amogh-jahagirdar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Change DataScan to accept Metadata and io [iceberg-python]

2024-04-05 Thread via GitHub



Fokko opened a new pull request, #581:
URL: https://github.com/apache/iceberg-python/pull/581

   For the partial deletes I want to do a scan on in memory metadata. Changing 
this API allows this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]

2024-04-05 Thread via GitHub



singhpk234 commented on issue #10064:
URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040031695

   which iceberg version ? 
   
   seems like the file_path metrics were truncated : 
   
   > This is because these metrics were truncated, Iceberg's default metrics 
mode for column metric is truncate(16). 
   
   This should be fixed by https://github.com/apache/iceberg/pull/6313.
   
   more on this via : https://github.com/apache/iceberg/issues/6694


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Add metadata tables [iceberg-python]

2024-04-05 Thread via GitHub



syun64 commented on issue #511:
URL: https://github.com/apache/iceberg-python/issues/511#issuecomment-2040023902

   Hi @Fokko and folks, we are interested in implementing the Partitions 
metadata table for our use case if it hasn't been picked up already


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add explicit JSON parser for ConfigResponse [iceberg]

2024-04-05 Thread via GitHub



amogh-jahagirdar commented on code in PR #9952:
URL: https://github.com/apache/iceberg/pull/9952#discussion_r1553791769


##
core/src/main/java/org/apache/iceberg/rest/responses/ConfigResponseParser.java:
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest.responses;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import java.io.IOException;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.JsonUtil;
+
+public class ConfigResponseParser {
+
+  private static final String DEFAULTS = "defaults";
+  private static final String OVERRIDES = "overrides";
+
+  private ConfigResponseParser() {}
+
+  public static String toJson(ConfigResponse response) {
+return toJson(response, false);
+  }
+
+  public static String toJson(ConfigResponse response, boolean pretty) {
+return JsonUtil.generate(gen -> toJson(response, gen), pretty);
+  }
+
+  public static void toJson(ConfigResponse response, JsonGenerator gen) throws 
IOException {
+Preconditions.checkArgument(null != response, "Invalid config response: 
null");
+
+gen.writeStartObject();
+
+JsonUtil.writeStringMap(DEFAULTS, response.defaults(), gen);

Review Comment:
   Should we just change this to not send back the key in the response in the 
value is null (makes for a cleaner response? I don't really think that's a 
behavior change since in the end a client needs to check that the value is not 
null.
   
   



##
core/src/main/java/org/apache/iceberg/rest/responses/ConfigResponseParser.java:
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.rest.responses;
+
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.databind.JsonNode;
+import java.io.IOException;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.util.JsonUtil;
+
+public class ConfigResponseParser {
+
+  private static final String DEFAULTS = "defaults";
+  private static final String OVERRIDES = "overrides";
+
+  private ConfigResponseParser() {}
+
+  public static String toJson(ConfigResponse response) {
+return toJson(response, false);
+  }
+
+  public static String toJson(ConfigResponse response, boolean pretty) {
+return JsonUtil.generate(gen -> toJson(response, gen), pretty);
+  }
+
+  public static void toJson(ConfigResponse response, JsonGenerator gen) throws 
IOException {
+Preconditions.checkArgument(null != response, "Invalid config response: 
null");
+
+gen.writeStartObject();
+
+JsonUtil.writeStringMap(DEFAULTS, response.defaults(), gen);

Review Comment:
   NVM these two fields are spec'd out so it seems like servers should always 
send it even though it may be null 
   
https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L83



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additi

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553740208


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   > I think this is the advantage of the op approach, it's left to user to 
decide what to do. If they want new op to be ignored, they could use a default 
match arm, otherwise they could choose to do pattern match against each 
operator.
   
   Yes, exactly. So I'm +1 for the op approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553653540


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   > ...wouldn't I have match op anyway, so having a default match arm in the 
implementation would solve the issue for the downstream user?
   
   I think this is the advantage of the op approach, it's left to user to 
decide what to do. If they want new op to be ignored, they could use a default 
match arm, otherwise they could choose to do pattern match against each 
operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add option to delete datafiles [iceberg-python]

2024-04-05 Thread via GitHub



Fokko commented on code in PR #569:
URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1553588970


##
pyiceberg/table/__init__.py:
##
@@ -2726,6 +2731,112 @@ def _commit(self) -> UpdatesAndRequirements:
 )
 
 
+class DeleteFiles(_MergingSnapshotProducer):
+_predicate: BooleanExpression
+
+def __init__(
+self,
+operation: Operation,
+transaction: Transaction,
+io: FileIO,
+commit_uuid: Optional[uuid.UUID] = None,
+snapshot_properties: Dict[str, str] = EMPTY_DICT,
+):
+super().__init__(operation, transaction, io, commit_uuid, 
snapshot_properties)
+self._predicate = AlwaysFalse()
+
+def _build_partition_projection(self, spec_id: int) -> BooleanExpression:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+project = inclusive_projection(schema, spec)
+return project(self._predicate)
+
+@cached_property
+def partition_filters(self) -> KeyDefaultDict[int, BooleanExpression]:
+return KeyDefaultDict(self._build_partition_projection)
+
+def _build_manifest_evaluator(self, spec_id: int) -> 
Callable[[ManifestFile], bool]:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+return manifest_evaluator(spec, schema, 
self.partition_filters[spec_id], case_sensitive=True)
+
+def _build_partition_evaluator(self, spec_id: int) -> Callable[[DataFile], 
bool]:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+partition_type = spec.partition_type(schema)
+partition_schema = Schema(*partition_type.fields)
+partition_expr = self.partition_filters[spec_id]
+
+return lambda data_file: expression_evaluator(partition_schema, 
partition_expr, case_sensitive=True)(data_file.partition)
+
+def delete(self, predicate: BooleanExpression) -> None:
+self._predicate = Or(self._predicate, predicate)
+
+@cached_property
+def _compute_deletes(self) -> Tuple[List[ManifestFile], 
List[ManifestEntry]]:
+schema = self._transaction.table_metadata.schema()
+
+def _copy_with_new_status(entry: ManifestEntry, status: 
ManifestEntryStatus) -> ManifestEntry:
+return ManifestEntry(
+status=status,
+snapshot_id=entry.snapshot_id,
+data_sequence_number=entry.data_sequence_number,
+file_sequence_number=entry.file_sequence_number,
+data_file=entry.data_file,
+)
+
+manifest_evaluators: Dict[int, Callable[[ManifestFile], bool]] = 
KeyDefaultDict(self._build_manifest_evaluator)
+strict_metrics_evaluator = _StrictMetricsEvaluator(schema, 
self._predicate, case_sensitive=True).eval

Review Comment:
   Great suggestion :) I would like to do that in a next iteration since that's 
an optimization. We don't delete whole manifest files in Java, and would like 
to understand why we don't do this before implementing this here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add option to delete datafiles [iceberg-python]

2024-04-05 Thread via GitHub



Fokko commented on code in PR #569:
URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1553580296


##
pyiceberg/table/__init__.py:
##
@@ -2726,6 +2731,112 @@ def _commit(self) -> UpdatesAndRequirements:
 )
 
 
+class DeleteFiles(_MergingSnapshotProducer):
+_predicate: BooleanExpression
+
+def __init__(
+self,
+operation: Operation,
+transaction: Transaction,
+io: FileIO,
+commit_uuid: Optional[uuid.UUID] = None,
+snapshot_properties: Dict[str, str] = EMPTY_DICT,
+):
+super().__init__(operation, transaction, io, commit_uuid, 
snapshot_properties)
+self._predicate = AlwaysFalse()
+
+def _build_partition_projection(self, spec_id: int) -> BooleanExpression:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+project = inclusive_projection(schema, spec)
+return project(self._predicate)
+
+@cached_property
+def partition_filters(self) -> KeyDefaultDict[int, BooleanExpression]:
+return KeyDefaultDict(self._build_partition_projection)
+
+def _build_manifest_evaluator(self, spec_id: int) -> 
Callable[[ManifestFile], bool]:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+return manifest_evaluator(spec, schema, 
self.partition_filters[spec_id], case_sensitive=True)
+
+def _build_partition_evaluator(self, spec_id: int) -> Callable[[DataFile], 
bool]:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+partition_type = spec.partition_type(schema)
+partition_schema = Schema(*partition_type.fields)
+partition_expr = self.partition_filters[spec_id]
+
+return lambda data_file: expression_evaluator(partition_schema, 
partition_expr, case_sensitive=True)(data_file.partition)
+
+def delete(self, predicate: BooleanExpression) -> None:
+self._predicate = Or(self._predicate, predicate)
+
+@cached_property
+def _compute_deletes(self) -> Tuple[List[ManifestFile], 
List[ManifestEntry]]:
+schema = self._transaction.table_metadata.schema()
+
+def _copy_with_new_status(entry: ManifestEntry, status: 
ManifestEntryStatus) -> ManifestEntry:
+return ManifestEntry(
+status=status,
+snapshot_id=entry.snapshot_id,
+data_sequence_number=entry.data_sequence_number,
+file_sequence_number=entry.file_sequence_number,
+data_file=entry.data_file,
+)
+
+manifest_evaluators: Dict[int, Callable[[ManifestFile], bool]] = 
KeyDefaultDict(self._build_manifest_evaluator)
+strict_metrics_evaluator = _StrictMetricsEvaluator(schema, 
self._predicate, case_sensitive=True).eval
+inclusive_metrics_evaluator = _InclusiveMetricsEvaluator(schema, 
self._predicate, case_sensitive=True).eval
+
+existing_manifests = []
+total_deleted_entries = []
+if snapshot := self._transaction.table_metadata.current_snapshot():
+for num, manifest_file in 
enumerate(snapshot.manifests(io=self._io)):
+if not 
manifest_evaluators[manifest_file.partition_spec_id](manifest_file):
+# If the manifest isn't relevant, we can just keep it in 
the manifest-list
+existing_manifests.append(manifest_file)
+else:
+# It is relevant, let's check out the content
+deleted_entries = []
+existing_entries = []
+for entry in 
manifest_file.fetch_manifest_entry(io=self._io):
+if strict_metrics_evaluator(entry.data_file) == 
ROWS_MUST_MATCH:
+
deleted_entries.append(_copy_with_new_status(entry, 
ManifestEntryStatus.DELETED))
+elif inclusive_metrics_evaluator(entry.data_file) == 
ROWS_CANNOT_MATCH:
+
existing_entries.append(_copy_with_new_status(entry, 
ManifestEntryStatus.EXISTING))
+else:
+raise ValueError("Deletes do not support rewrites 
of data files")
+
+if len(deleted_entries) > 0:
+total_deleted_entries += deleted_entries
+
+# Rewrite the manifest
+if len(existing_entries) > 0:
+output_file_location = _new_manifest_path(
+
location=self._transaction.table_metadata.location, num=num, 
commit_uuid=self.commit_uuid
+)
+with write_manifest(
+
format_version=self._transaction.table_metadata.format_version,
+
spec=se

Re: [PR] Add option to delete datafiles [iceberg-python]

2024-04-05 Thread via GitHub



Fokko commented on code in PR #569:
URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1553580296


##
pyiceberg/table/__init__.py:
##
@@ -2726,6 +2731,112 @@ def _commit(self) -> UpdatesAndRequirements:
 )
 
 
+class DeleteFiles(_MergingSnapshotProducer):
+_predicate: BooleanExpression
+
+def __init__(
+self,
+operation: Operation,
+transaction: Transaction,
+io: FileIO,
+commit_uuid: Optional[uuid.UUID] = None,
+snapshot_properties: Dict[str, str] = EMPTY_DICT,
+):
+super().__init__(operation, transaction, io, commit_uuid, 
snapshot_properties)
+self._predicate = AlwaysFalse()
+
+def _build_partition_projection(self, spec_id: int) -> BooleanExpression:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+project = inclusive_projection(schema, spec)
+return project(self._predicate)
+
+@cached_property
+def partition_filters(self) -> KeyDefaultDict[int, BooleanExpression]:
+return KeyDefaultDict(self._build_partition_projection)
+
+def _build_manifest_evaluator(self, spec_id: int) -> 
Callable[[ManifestFile], bool]:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+return manifest_evaluator(spec, schema, 
self.partition_filters[spec_id], case_sensitive=True)
+
+def _build_partition_evaluator(self, spec_id: int) -> Callable[[DataFile], 
bool]:
+schema = self._transaction.table_metadata.schema()
+spec = self._transaction.table_metadata.specs()[spec_id]
+partition_type = spec.partition_type(schema)
+partition_schema = Schema(*partition_type.fields)
+partition_expr = self.partition_filters[spec_id]
+
+return lambda data_file: expression_evaluator(partition_schema, 
partition_expr, case_sensitive=True)(data_file.partition)
+
+def delete(self, predicate: BooleanExpression) -> None:
+self._predicate = Or(self._predicate, predicate)
+
+@cached_property
+def _compute_deletes(self) -> Tuple[List[ManifestFile], 
List[ManifestEntry]]:
+schema = self._transaction.table_metadata.schema()
+
+def _copy_with_new_status(entry: ManifestEntry, status: 
ManifestEntryStatus) -> ManifestEntry:
+return ManifestEntry(
+status=status,
+snapshot_id=entry.snapshot_id,
+data_sequence_number=entry.data_sequence_number,
+file_sequence_number=entry.file_sequence_number,
+data_file=entry.data_file,
+)
+
+manifest_evaluators: Dict[int, Callable[[ManifestFile], bool]] = 
KeyDefaultDict(self._build_manifest_evaluator)
+strict_metrics_evaluator = _StrictMetricsEvaluator(schema, 
self._predicate, case_sensitive=True).eval
+inclusive_metrics_evaluator = _InclusiveMetricsEvaluator(schema, 
self._predicate, case_sensitive=True).eval
+
+existing_manifests = []
+total_deleted_entries = []
+if snapshot := self._transaction.table_metadata.current_snapshot():
+for num, manifest_file in 
enumerate(snapshot.manifests(io=self._io)):
+if not 
manifest_evaluators[manifest_file.partition_spec_id](manifest_file):
+# If the manifest isn't relevant, we can just keep it in 
the manifest-list
+existing_manifests.append(manifest_file)
+else:
+# It is relevant, let's check out the content
+deleted_entries = []
+existing_entries = []
+for entry in 
manifest_file.fetch_manifest_entry(io=self._io):
+if strict_metrics_evaluator(entry.data_file) == 
ROWS_MUST_MATCH:
+
deleted_entries.append(_copy_with_new_status(entry, 
ManifestEntryStatus.DELETED))
+elif inclusive_metrics_evaluator(entry.data_file) == 
ROWS_CANNOT_MATCH:
+
existing_entries.append(_copy_with_new_status(entry, 
ManifestEntryStatus.EXISTING))
+else:
+raise ValueError("Deletes do not support rewrites 
of data files")
+
+if len(deleted_entries) > 0:
+total_deleted_entries += deleted_entries
+
+# Rewrite the manifest
+if len(existing_entries) > 0:
+output_file_location = _new_manifest_path(
+
location=self._transaction.table_metadata.location, num=num, 
commit_uuid=self.commit_uuid
+)
+with write_manifest(
+
format_version=self._transaction.table_metadata.format_version,
+
spec=se

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nastra merged PR #10082:
URL: https://github.com/apache/iceberg/pull/10082


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553435328


##
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveClientPool.java:
##
@@ -116,6 +119,52 @@ public void 
testGetTablesFailsForNonReconnectableException() throws Exception {
 .hasMessage("Another meta exception");
   }
 
+  @Test
+  public void testExceptionMessages() {
+// Test Wrapped MetaException with a message

Review Comment:
   I don't think those comments are helpful. I would just remove all of those



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nk1506 commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553430668


##
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveClientPool.java:
##
@@ -116,6 +119,52 @@ public void 
testGetTablesFailsForNonReconnectableException() throws Exception {
 .hasMessage("Another meta exception");
   }
 
+  @Test
+  public void testExceptionMessages() {
+// Test Wrapped MetaException with a message
+try (MockedStatic mockedStatic = 
Mockito.mockStatic(MetaStoreUtils.class)) {

Review Comment:
   Couldn't find a way to mock calling method 
   ```
   return GET_CLIENT.invoke(
   hiveConf, (HiveMetaHookLoader) tbl -> null, 
HiveMetaStoreClient.class.getName());
   ```
   and throw Exception. Had to trace and mock the class which is throwing 
error. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553429104


##
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java:
##
@@ -970,6 +975,47 @@ public void testCatalogWithCustomMetricsReporter() throws 
IOException {
 Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2);
   }
 
+  @Test
+  public void testCommitExceptionWithoutMessage() {
+TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
+BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
+JdbcTableOperations ops = (JdbcTableOperations) table.operations();

Review Comment:
   no need for the casting here. See the simplified test version in my previous 
comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553297280


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   > Another option is to use following method:
   > 
   > ```rust
   > trait BoundPredicateVisitor {
   >   fn visitor_op(&mut self, op: &PredicateOperator, reference: 
&BoundReference, results: Vec) -> Result {}
   > }
   > ```
   
   I think I'm in favor of keeping the trait as generic as possible - so we 
don't have to modify it, if we have to support a new operator. 
   
   When implementing this trait (no matter which variant) I'd most likely have 
to overwrite the default behaviour anyway? So I wouldn't bother providing any 
in the first place and keep the trait as generic and future proof as possible? 
If that makes any sense to you?
   
   >This way we can force user to think about added new operator, otherwise it 
will throw compiler error, but this may break things when we add new operator.
   
   ...wouldn't I have match `op` anyway, so having a default match arm in the 
implementation would solve the issue for the downstream user?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553297280


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   > Another option is to use following method:
   > 
   > ```rust
   > trait BoundPredicateVisitor {
   >   fn visitor_op(&mut self, op: &PredicateOperator, reference: 
&BoundReference, results: Vec) -> Result {}
   > }
   > ```
   
   I think I'm in favor of keeping the trait as generic as possible - so we 
don't have to modify it, if we have to support a new operator. 
   
   When implementing this trait (no matter which variant) I'd most likely have 
to overwrite the default behaviour anyway? So I wouldn't bother providing any 
in the first place and keep the trait as generic and future proof as possible? 
If that makes any sense to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Implement `project` for `Transform`. [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 closed issue #264: Implement `project` for `Transform`.
URL: https://github.com/apache/iceberg-rust/issues/264


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Project transform [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 merged PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1553285045


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),
+Type(Type),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {
+position: i32,
+inner_or_type: InnerOrType,
+}
+
+pub(crate) type StructAccessorRef = Arc;
+
+impl StructAccessor {
+pub(crate) fn new(position: i32, r#type: Type) -> Self {
+StructAccessor {
+position,
+inner_or_type: InnerOrType::Type(r#type),
+}
+}
+
+pub(crate) fn wrap(position: i32, inner: StructAccessorRef) -> Self {
+StructAccessor {
+position,
+inner_or_type: InnerOrType::Inner(inner),
+}
+}
+
+pub fn position(&self) -> i32 {
+self.position
+}
+
+fn r#type(&self) -> &Type {
+match &self.inner_or_type {
+InnerOrType::Inner(inner) => inner.r#type(),
+InnerOrType::Type(r#type) => r#type,
+}
+}
+
+fn get<'a>(&'a self, container: &'a Struct) -> &Literal {

Review Comment:
   > Should we return `Datum` here? This better follows python/java 
implementation.
   
   unless we explicitly need `Literal` somewhere downstream (haven't checked 
python/java yet). I'd definitely +1 for returning `Datum` here, which should be 
easier to handle downstream.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553279624


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   > If we add default implementations that return something like 
Err(OperatorNotImplemented), then if we add an operator, wouldn't existing code 
work for existing operators and only break if users then try to use the new 
operator without having updated any visitors they have to use the new operator?
   
   Yes, I agree with that.
   
   Another option is to use following method:
   ```rust
   trait BoundPredicateVisitor {
 fn visitor_op(&mut self, op: &PredicateOperator, reference: 
&BoundReference, results: Vec) -> Result {}
   }
   ```
   
   This way we can force user to think about added new operator, otherwise it 
will throw compiler error, but this may break things when we add new operator.
   
   I think the two approaches have different pros and cons, let's wait to see 
others's comments. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Project transform [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2039348517

   > Hi, @marvinlanhenke Thanks for pr, it looks great! I have some small 
suggestion to restructure the code to make it easier for review. Really 
greatful for these tests!
   
   Thanks for the review - I did some minor fixes according to your 
suggestions. If no other comments, I think we're good to go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Project transform [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553275061


##
crates/iceberg/src/spec/transform.rs:
##
@@ -261,6 +269,323 @@ impl Transform {
 _ => self == other,
 }
 }
+
+/// Projects a given predicate according to the transformation
+/// specified by the `Transform` instance.
+///
+/// This allows predicates to be effectively applied to data
+/// that has undergone transformation, enabling efficient querying
+/// and filtering based on the original, untransformed data.
+///
+/// # Example
+/// Suppose, we have row filter `a = 10`, and a partition spec
+/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition
+/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to
+/// `bs = bucket(10, 37)`
+pub fn project(&self, name: String, predicate: &BoundPredicate) -> 
Result> {
+let func = create_transform_function(self)?;
+
+match self {
+Transform::Identity => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => 
Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literal().to_owned(),
+,
+BoundPredicate::Set(expr) => 
Ok(Some(Predicate::Set(SetExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literals().to_owned(),
+,
+_ => Ok(None),
+},
+Transform::Bucket(_) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => self.project_binary(name, 
expr, &func),
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Truncate(width) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, Some(*width))
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Year | Transform::Month | Transform::Day | 
Transform::Hour => {
+match predicate {
+BoundPredicate::Unary(expr) => 
Self::project_unary(expr.op(), name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, None)
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+}
+}
+_ => Ok(None),
+}
+}
+
+/// Check if `Transform` is applicable on datum's `PrimitiveType`
+fn can_transform(&self, datum: &Datum) -> bool {
+let input_type = datum.data_type().clone();
+self.result_type(&Type::Primitive(input_type)).is_ok()
+}
+
+/// Creates a unary predicate from a given operator and a reference name.
+fn project_unary(op: PredicateOperator, name: String) -> 
Result> {
+Ok(Some(Predicate::Unary(UnaryExpression::new(
+op,
+Reference::new(name),
+
+}
+
+/// Attempts to create a binary predicate based on a binary expression,
+/// if applicable.
+///
+/// This method evaluates a given binary expression and, if the operation
+/// is equality (`Eq`) and the literal can be transformed, constructs a
+/// `Predicate::Binary`variant representing the binary operation.
+fn project_binary(

Review Comment:
   I missed that, we don't need a Generic. Fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Project transform [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553272539


##
crates/iceberg/src/spec/transform.rs:
##
@@ -261,6 +269,323 @@ impl Transform {
 _ => self == other,
 }
 }
+
+/// Projects a given predicate according to the transformation
+/// specified by the `Transform` instance.
+///
+/// This allows predicates to be effectively applied to data
+/// that has undergone transformation, enabling efficient querying
+/// and filtering based on the original, untransformed data.
+///
+/// # Example
+/// Suppose, we have row filter `a = 10`, and a partition spec
+/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition
+/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to
+/// `bs = bucket(10, 37)`
+pub fn project(&self, name: String, predicate: &BoundPredicate) -> 
Result> {
+let func = create_transform_function(self)?;
+
+match self {
+Transform::Identity => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => 
Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literal().to_owned(),
+,
+BoundPredicate::Set(expr) => 
Ok(Some(Predicate::Set(SetExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literals().to_owned(),
+,
+_ => Ok(None),
+},
+Transform::Bucket(_) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => self.project_binary(name, 
expr, &func),
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Truncate(width) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, Some(*width))
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Year | Transform::Month | Transform::Day | 
Transform::Hour => {
+match predicate {
+BoundPredicate::Unary(expr) => 
Self::project_unary(expr.op(), name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, None)
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+}
+}
+_ => Ok(None),
+}
+}
+
+/// Check if `Transform` is applicable on datum's `PrimitiveType`
+fn can_transform(&self, datum: &Datum) -> bool {
+let input_type = datum.data_type().clone();
+self.result_type(&Type::Primitive(input_type)).is_ok()
+}
+
+/// Creates a unary predicate from a given operator and a reference name.
+fn project_unary(op: PredicateOperator, name: String) -> 
Result> {
+Ok(Some(Predicate::Unary(UnaryExpression::new(
+op,
+Reference::new(name),
+
+}
+
+/// Attempts to create a binary predicate based on a binary expression,
+/// if applicable.
+///
+/// This method evaluates a given binary expression and, if the operation
+/// is equality (`Eq`) and the literal can be transformed, constructs a
+/// `Predicate::Binary`variant representing the binary operation.
+fn project_binary(
+&self,
+name: String,
+expr: &BinaryExpression,
+func: &BoxedTransformFunction,
+) -> Result> {
+if expr.op() != PredicateOperator::Eq || 
!self.can_transform(expr.literal()) {
+return Ok(None);
+}
+
+Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+func.transform_literal_result(expr.literal())?,
+
+}
+
+/// Projects a binary expression to a predicate with an adjusted boundary.
+///
+/// Checks if the literal within the given binary expression is
+/// transformable. If transformable, it proceeds to potentially adjust
+/// the boundary of the expression based on the comparison operator (`op`).
+/// The potential adjustments involve incrementing or decrementing the
+/// literal value and changing the `PredicateOperator` itself to its
+/// inclusive variant.
+fn project_

Re: [PR] feat: Project transform [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553271703


##
crates/iceberg/src/spec/transform.rs:
##
@@ -261,6 +269,323 @@ impl Transform {
 _ => self == other,
 }
 }
+
+/// Projects a given predicate according to the transformation
+/// specified by the `Transform` instance.
+///
+/// This allows predicates to be effectively applied to data
+/// that has undergone transformation, enabling efficient querying
+/// and filtering based on the original, untransformed data.
+///
+/// # Example
+/// Suppose, we have row filter `a = 10`, and a partition spec
+/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition
+/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to
+/// `bs = bucket(10, 37)`
+pub fn project(&self, name: String, predicate: &BoundPredicate) -> 
Result> {
+let func = create_transform_function(self)?;
+
+match self {
+Transform::Identity => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => 
Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literal().to_owned(),
+,
+BoundPredicate::Set(expr) => 
Ok(Some(Predicate::Set(SetExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literals().to_owned(),
+,
+_ => Ok(None),
+},
+Transform::Bucket(_) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => self.project_binary(name, 
expr, &func),
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Truncate(width) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, Some(*width))
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Year | Transform::Month | Transform::Day | 
Transform::Hour => {
+match predicate {
+BoundPredicate::Unary(expr) => 
Self::project_unary(expr.op(), name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, None)
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+}
+}
+_ => Ok(None),
+}
+}
+
+/// Check if `Transform` is applicable on datum's `PrimitiveType`
+fn can_transform(&self, datum: &Datum) -> bool {
+let input_type = datum.data_type().clone();
+self.result_type(&Type::Primitive(input_type)).is_ok()
+}
+
+/// Creates a unary predicate from a given operator and a reference name.
+fn project_unary(op: PredicateOperator, name: String) -> 
Result> {
+Ok(Some(Predicate::Unary(UnaryExpression::new(
+op,
+Reference::new(name),
+
+}
+
+/// Attempts to create a binary predicate based on a binary expression,
+/// if applicable.
+///
+/// This method evaluates a given binary expression and, if the operation
+/// is equality (`Eq`) and the literal can be transformed, constructs a
+/// `Predicate::Binary`variant representing the binary operation.
+fn project_binary(
+&self,
+name: String,
+expr: &BinaryExpression,
+func: &BoxedTransformFunction,
+) -> Result> {
+if expr.op() != PredicateOperator::Eq || 
!self.can_transform(expr.literal()) {
+return Ok(None);
+}
+
+Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+func.transform_literal_result(expr.literal())?,
+
+}
+
+/// Projects a binary expression to a predicate with an adjusted boundary.
+///
+/// Checks if the literal within the given binary expression is
+/// transformable. If transformable, it proceeds to potentially adjust
+/// the boundary of the expression based on the comparison operator (`op`).
+/// The potential adjustments involve incrementing or decrementing the
+/// literal value and changing the `PredicateOperator` itself to its
+/// inclusive variant.
+fn project_

Re: [PR] feat: Project transform [iceberg-rust]

2024-04-05 Thread via GitHub



marvinlanhenke commented on code in PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553262971


##
crates/iceberg/src/spec/transform.rs:
##
@@ -261,6 +269,323 @@ impl Transform {
 _ => self == other,
 }
 }
+
+/// Projects a given predicate according to the transformation
+/// specified by the `Transform` instance.
+///
+/// This allows predicates to be effectively applied to data
+/// that has undergone transformation, enabling efficient querying
+/// and filtering based on the original, untransformed data.
+///
+/// # Example
+/// Suppose, we have row filter `a = 10`, and a partition spec
+/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition
+/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to
+/// `bs = bucket(10, 37)`
+pub fn project(&self, name: String, predicate: &BoundPredicate) -> 
Result> {
+let func = create_transform_function(self)?;
+
+match self {
+Transform::Identity => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => 
Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literal().to_owned(),
+,
+BoundPredicate::Set(expr) => 
Ok(Some(Predicate::Set(SetExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literals().to_owned(),
+,
+_ => Ok(None),
+},
+Transform::Bucket(_) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => self.project_binary(name, 
expr, &func),
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Truncate(width) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, Some(*width))
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+},
+Transform::Year | Transform::Month | Transform::Day | 
Transform::Hour => {
+match predicate {
+BoundPredicate::Unary(expr) => 
Self::project_unary(expr.op(), name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
&func, None)
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
&func),
+_ => Ok(None),
+}
+}
+_ => Ok(None),
+}
+}
+
+/// Check if `Transform` is applicable on datum's `PrimitiveType`
+fn can_transform(&self, datum: &Datum) -> bool {
+let input_type = datum.data_type().clone();
+self.result_type(&Type::Primitive(input_type)).is_ok()
+}
+
+/// Creates a unary predicate from a given operator and a reference name.
+fn project_unary(op: PredicateOperator, name: String) -> 
Result> {
+Ok(Some(Predicate::Unary(UnaryExpression::new(
+op,
+Reference::new(name),
+
+}
+
+/// Attempts to create a binary predicate based on a binary expression,
+/// if applicable.
+///
+/// This method evaluates a given binary expression and, if the operation
+/// is equality (`Eq`) and the literal can be transformed, constructs a
+/// `Predicate::Binary`variant representing the binary operation.
+fn project_binary(
+&self,
+name: String,
+expr: &BinaryExpression,
+func: &BoxedTransformFunction,
+) -> Result> {
+if expr.op() != PredicateOperator::Eq || 
!self.can_transform(expr.literal()) {
+return Ok(None);
+}
+
+Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+func.transform_literal_result(expr.literal())?,
+
+}
+
+/// Projects a binary expression to a predicate with an adjusted boundary.
+///
+/// Checks if the literal within the given binary expression is
+/// transformable. If transformable, it proceeds to potentially adjust
+/// the boundary of the expression based on the comparison operator (`op`).
+/// The potential adjustments involve incrementing or decrementing the
+/// literal value and changing the `PredicateOperator` itself to its
+/// inclusive variant.
+fn project_

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553253806


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   If we add default implementations that return something like 
`Err(OperatorNotImplemented)`, then if we add an operator, wouldn't existing 
code work for existing operators and only break if users then try to use the 
new operator without having updated any visitors they have to use the new 
operator? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553236192


##
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java:
##
@@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws 
IOException {
 Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2);
   }
 
+  @Test
+  public void testCommitExceptionWithoutMessage() {

Review Comment:
   can be simplified to 
   ```
   @Test
 public void testCommitExceptionWithoutMessage() {
   TableIdentifier tableIdent = TableIdentifier.of("db", "tbl");
   BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
   TableOperations ops = table.operations();
   TableMetadata metadataV1 = ops.current();
   
   table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();
   ops.refresh();
   
   try (MockedStatic mockedStatic = 
Mockito.mockStatic(JdbcUtil.class)) {
 mockedStatic
 .when(() -> JdbcUtil.loadTable(any(), any(), any(), any()))
 .thenThrow(new SQLException());
 assertThatThrownBy(() -> ops.commit(ops.current(), metadataV1))
 .isInstanceOf(UncheckedSQLException.class)
 .hasMessageStartingWith("Unknown failure");
   }
 }
   ```
   
   please also update the other tests accordingly. The spying on `ops` isn't 
necessary here, because you're throwing the exception when the table/view is 
being loaded (which is different from 
https://github.com/apache/iceberg/pull/10069)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553220669


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn always_true(&mut self) -> Result;
+fn always_false(&mut self) -> Result;
+
+fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result;
+fn not(&mut self, inner: Self::T) -> Result;
+
+fn is_null(&mut self, reference: &BoundReference) -> Result;

Review Comment:
   The problem with this approach is that when we add an operator, we need to 
modify the trait. We can still keep it compatible with default function, but 
the actual behavior may break when user upgrades library. cc @viirya @Xuanwo 
@marvinlanhenke What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553215173


##
crates/iceberg/src/expr/predicate.rs:
##
@@ -217,6 +249,10 @@ impl Display for SetExpression {
 /// Unbound predicate expression before binding to a schema.
 #[derive(Debug, PartialEq)]
 pub enum Predicate {
+/// AlwaysTrue predicate, for example, `TRUE`.
+AlwaysTrue,
+/// AlwaysFalse predicate, for example, `FALSE`.
+AlwaysFalse,

Review Comment:
   Reason to me, thanks for the explaination.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-05 Thread via GitHub



liurenjie1024 commented on code in PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1553151792


##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),
+Type(Type),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {
+position: i32,
+inner_or_type: InnerOrType,

Review Comment:
   This seems inefficient to me, each time we call `r#type`, it needs to 
recursively call inners for nested field. How about just following:
   ```rust
   r#type: Type,
   inner: Option
   ```



##
crates/iceberg/src/spec/schema.rs:
##
@@ -137,9 +142,55 @@ impl SchemaBuilder {
 name_to_id,
 lowercase_name_to_id,
 id_to_name,
+
+field_id_to_accessor,
 })
 }
 
+fn build_accessors(&self) -> HashMap> {

Review Comment:
   How about using SchemaVisitor to build this?



##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),
+Type(Type),
+}
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+pub struct StructAccessor {

Review Comment:
   How about just name it `Accessor`? This is in fact field accessor.



##
crates/iceberg/src/expr/accessor.rs:
##
@@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::spec::{Literal, Struct, Type};
+use serde_derive::{Deserialize, Serialize};
+use std::sync::Arc;
+
+#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
+enum InnerOrType {
+Inner(Arc),

Review Comment:
   ```suggestion
   Inner(Box),
   ```
   I think the inner accessor is not shared, so it would be more efficient to 
use `Box`



##
crates/iceberg/src/spec/schema.rs:
##
@@ -137,9 +142,55 @@ impl SchemaBuilder {
 name_to_id,
 lowercase_name_to_id,
 id_to_name,
+
+field_id_to_accessor,
 })
 }
 
+fn build_accessors(&self) -> HashMap> {

Review Comment:
   Also it would be better to add some uts to show the result.



##
crates/iceberg/src/spec/values.rs:
##
@@ -1141,6 +1141,11 @@ impl St

Re: [I] Support creating tags [iceberg-python]

2024-04-05 Thread via GitHub



Fokko commented on issue #573:
URL: https://github.com/apache/iceberg-python/issues/573#issuecomment-2039233700

   @enkidulan Thanks for reaching out here. Are you interested in creating the 
API for this? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553049382


##
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java:
##
@@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws 
IOException {
 Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2);
   }
 
+  @Test
+  public void testCommitExceptionWithoutMessage() {
+TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
+BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
+JdbcTableOperations ops = (JdbcTableOperations) table.operations();
+TableMetadata metadataV1 = ops.current();
+
+table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();
+ops.refresh();
+
+JdbcTableOperations spyOps = spy(ops);
+
+try (MockedStatic mockedStatic = 
Mockito.mockStatic(JdbcUtil.class)) {
+  mockedStatic
+  .when(() -> JdbcUtil.loadTable(any(), any(), any(), any()))
+  .thenThrow(new SQLException());
+  assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1))
+  .isInstanceOf(UncheckedSQLException.class)
+  .hasMessageStartingWith("Unknown failure");
+}
+  }
+
+  @Test
+  public void testCommitExceptionWithMessage() {
+TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
+BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
+JdbcTableOperations ops = (JdbcTableOperations) table.operations();
+
+TableMetadata metadataV1 = ops.current();
+
+table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();
+ops.refresh();
+
+JdbcTableOperations spyOps = spy(ops);
+
+try (MockedStatic mockedStatic = 
Mockito.mockStatic(JdbcUtil.class)) {
+  mockedStatic
+  .when(() -> JdbcUtil.loadTable(any(), any(), any(), any()))
+  .thenThrow(new SQLException("constraint failed"));

Review Comment:
   shouldn't this have a null message?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Partitioned Append on Identity Transform [iceberg-python]

2024-04-05 Thread via GitHub



Fokko merged PR #555:
URL: https://github.com/apache/iceberg-python/pull/555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553098425


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,363 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn visit(&mut self, predicate: &BoundPredicate) -> Result {

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nk1506 commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553083646


##
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java:
##
@@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws 
IOException {
 Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2);
   }
 
+  @Test
+  public void testCommitExceptionWithoutMessage() {
+TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
+BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
+JdbcTableOperations ops = (JdbcTableOperations) table.operations();
+TableMetadata metadataV1 = ops.current();
+
+table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();
+ops.refresh();
+
+JdbcTableOperations spyOps = spy(ops);
+
+try (MockedStatic mockedStatic = 
Mockito.mockStatic(JdbcUtil.class)) {
+  mockedStatic
+  .when(() -> JdbcUtil.loadTable(any(), any(), any(), any()))
+  .thenThrow(new SQLException());
+  assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1))
+  .isInstanceOf(UncheckedSQLException.class)
+  .hasMessageStartingWith("Unknown failure");
+}
+  }
+
+  @Test
+  public void testCommitExceptionWithMessage() {
+TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
+BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
+JdbcTableOperations ops = (JdbcTableOperations) table.operations();
+
+TableMetadata metadataV1 = ops.current();
+
+table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();
+ops.refresh();
+
+JdbcTableOperations spyOps = spy(ops);
+
+try (MockedStatic mockedStatic = 
Mockito.mockStatic(JdbcUtil.class)) {
+  mockedStatic
+  .when(() -> JdbcUtil.loadTable(any(), any(), any(), any()))
+  .thenThrow(new SQLException("constraint failed"));

Review Comment:
   Null message we are checking at other 
[testCommitExceptionWithoutMessage](https://github.com/apache/iceberg/pull/10082/files#diff-754dd3358fceeaa7214faa73eee5b8b0e71769f7d9a2b568a17191e42db9fce2R994).
 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553082179


##
crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs:
##
@@ -0,0 +1,363 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::{BoundPredicate, BoundReference, PredicateOperator};
+use crate::spec::Datum;
+use crate::Result;
+use fnv::FnvHashSet;
+
+pub trait BoundPredicateVisitor {
+type T;
+
+fn visit(&mut self, predicate: &BoundPredicate) -> Result {

Review Comment:
   Ok, happy to change, will do so



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]

2024-04-05 Thread via GitHub



sdd commented on code in PR #320:
URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553077742


##
crates/iceberg/src/expr/predicate.rs:
##
@@ -217,6 +249,10 @@ impl Display for SetExpression {
 /// Unbound predicate expression before binding to a schema.
 #[derive(Debug, PartialEq)]
 pub enum Predicate {
+/// AlwaysTrue predicate, for example, `TRUE`.
+AlwaysTrue,
+/// AlwaysFalse predicate, for example, `FALSE`.
+AlwaysFalse,

Review Comment:
   It's in here because it is needed in two places in `InclusiveProjection`:
   
   - First, when we "unbind", transforming a `BoundPredicate` into an unbound 
`Predicate`: 
https://github.com/apache/iceberg-rust/pull/321/files#diff-419289c8023c272de50ee4352320ad217fa7c2802f7f633d26e55ece30426be6R41
   - Second, when we build up a `Predicate` by "and"ing together a (possibly 
empty) list of `Predicate`s, starting with an `AlwaysTrue` to ensure that we 
return a valid `Predicate` even in the empty list case: 
https://github.com/apache/iceberg-rust/pull/321/files#diff-419289c8023c272de50ee4352320ad217fa7c2802f7f633d26e55ece30426be6R76
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Hive-View Integration Tests [iceberg]

2024-04-05 Thread via GitHub



nk1506 commented on code in PR #10088:
URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553075946


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1421,38 +1442,38 @@ public void showViews() throws NoSuchTableException {
 insertRows(6);
 String sql = String.format("SELECT * from %s", tableName);
 sql("CREATE VIEW v1 AS %s", sql);
-sql("CREATE VIEW prefixV2 AS %s", sql);
-sql("CREATE VIEW prefixV3 AS %s", sql);
+sql("CREATE VIEW prefixv2 AS %s", sql);

Review Comment:
   Hive Catalog returns table/view name always in lower case. To be safe 
changed the name to lowercase only. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Hive-View Integration Tests [iceberg]

2024-04-05 Thread via GitHub



nk1506 commented on code in PR #10088:
URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553073692


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1341,10 +1355,17 @@ public void 
createViewWithSubqueryExpressionInQueryThatIsRewritten() throws NoSu
 .containsExactly(row(3), row(3), row(3));
 
 sql("USE spark_catalog");
-
 assertThatThrownBy(() -> sql(sql))
-.isInstanceOf(AnalysisException.class)
-.hasMessageContaining(String.format("The table or view `%s` cannot be 
found", tableName));
+.satisfiesAnyOf(
+throwable ->
+assertThat(throwable)
+.isInstanceOfAny(AnalysisException.class)
+.hasMessageContaining(
+String.format("The table or view `%s` cannot be 
found", tableName)),
+throwable ->

Review Comment:
   same



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Hive-View Integration Tests [iceberg]

2024-04-05 Thread via GitHub



nk1506 commented on code in PR #10088:
URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553073213


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1318,8 +1324,16 @@ public void 
createViewWithSubqueryExpressionInFilterThatIsRewritten()
 sql("USE spark_catalog");
 
 assertThatThrownBy(() -> sql(sql))
-.isInstanceOf(AnalysisException.class)
-.hasMessageContaining(String.format("The table or view `%s` cannot be 
found", tableName));
+.satisfiesAnyOf(
+throwable ->
+assertThat(throwable)

Review Comment:
   Hive throws Exception of type `SparkException` with error message as 
`Exception thrown in awaitResult`. 
   Should we override these tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Hive-View Integration Tests [iceberg]

2024-04-05 Thread via GitHub



nk1506 commented on code in PR #10088:
URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553071451


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -900,7 +906,7 @@ public void dropV1View() {
 String v1View = "v1ViewToBeDropped";
 sql("USE spark_catalog");
 sql("CREATE NAMESPACE IF NOT EXISTS %s", NAMESPACE);
-sql("CREATE TABLE %s (id INT, data STRING)", tableName);
+sql("CREATE TABLE IF NOT EXISTS %s (id INT, data STRING)", tableName);

Review Comment:
   Hive is throwing table already found error. Even though we are switching 
catalog. we can also override this test for Hive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Hive-View Integration Tests [iceberg]

2024-04-05 Thread via GitHub



nk1506 commented on code in PR #10088:
URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553067799


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -62,6 +64,9 @@ public class TestViews extends ExtensionsTestBase {
   private static final Namespace NAMESPACE = Namespace.of("default");
   private final String tableName = "table";
 
+  @Parameter(index = 3)
+  protected String location;

Review Comment:
   Different catalog has different scheme+path. Current tests were only for 
InMemoryCatalog. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-05 Thread via GitHub



nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553049382


##
core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java:
##
@@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws 
IOException {
 Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2);
   }
 
+  @Test
+  public void testCommitExceptionWithoutMessage() {
+TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
+BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
+JdbcTableOperations ops = (JdbcTableOperations) table.operations();
+TableMetadata metadataV1 = ops.current();
+
+table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();
+ops.refresh();
+
+JdbcTableOperations spyOps = spy(ops);
+
+try (MockedStatic mockedStatic = 
Mockito.mockStatic(JdbcUtil.class)) {
+  mockedStatic
+  .when(() -> JdbcUtil.loadTable(any(), any(), any(), any()))
+  .thenThrow(new SQLException());
+  assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1))
+  .isInstanceOf(UncheckedSQLException.class)
+  .hasMessageStartingWith("Unknown failure");
+}
+  }
+
+  @Test
+  public void testCommitExceptionWithMessage() {
+TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl");
+BaseTable table = (BaseTable) catalog.buildTable(tableIdent, 
SCHEMA).create();
+JdbcTableOperations ops = (JdbcTableOperations) table.operations();
+
+TableMetadata metadataV1 = ops.current();
+
+table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();
+ops.refresh();
+
+JdbcTableOperations spyOps = spy(ops);
+
+try (MockedStatic mockedStatic = 
Mockito.mockStatic(JdbcUtil.class)) {
+  mockedStatic
+  .when(() -> JdbcUtil.loadTable(any(), any(), any(), any()))
+  .thenThrow(new SQLException("constraint failed"));

Review Comment:
   shouldn't this have a null message?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org