[I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]
HonahX opened a new issue, #588: URL: https://github.com/apache/iceberg-python/issues/588 ### Question Currently, the HiveCatalog's `_commit_table` workflow looks like: 1. load current table metadata via `load_table` 2. construct updated metadata 3. lock the hive table 4. alter the hive table 5. unlock the hive table Suppose now there are 2 process, A and B try to commit some changes to the same iceberg table It is possible that the code execution happens to be in the following order: 1. process A load current table metadata 2. process A construct updated metadata 3. process B starts and finishes the **whole** `_commit_table` 4. process A lock the hive table 5. process A alter the hive table 6. process A unlock the hive table In this specific scenario, both processes successfully commit their changes because process B releases the lock before A tries to acquire. But if the `alter_table` does not support [transactional check](https://issues.apache.org/jira/browse/HIVE-26882), the changes made by process B will be overridden. Since in python we do not know which Hive version we are connecting to, I wonder if we need to update the code to lock the table before loading current table metadata, like what [Java implementation](https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L184) does. BTW, it seems there are some consistency issue of https://issues.apache.org/jira/browse/HIVE-26882 as well and there is an open fix for that https://github.com/apache/hive/pull/5129 Please correct me if I misunderstand something here. Thanks! cc: @Fokko -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Suppress duplicate OAuth token fetching in rest catalog client [iceberg-python]
TennyZhuang opened a new issue, #587: URL: https://github.com/apache/iceberg-python/issues/587 ### Feature Request / Improvement In the rest catalog client, we implemented the OAuth token refresh based on retry mechanism. https://github.com/apache/iceberg-python/blob/4148edb5e28ae88024a55e0b112238e65b873957/pyiceberg/catalog/rest.py#L126-L128 If there are concurrent calls to the rest client API, they may failed at the same time, then thousands of `_refresh_token` may be called. The behavior is likely acceptable with no exceptions, but it appears wasteful and not as expected. A potential solution is introducing a `threading.Lock` to protect the fetching process. Every call to `_refresh_token` should acquire the `Lock` first, and check whether the token is same as the expired token. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] SparkSessionCatalog with JDBC catalog: SHOW TABLES IN ... returns error but table exists in JDBC catalog [iceberg]
matepek commented on issue #10003: URL: https://github.com/apache/iceberg/issues/10003#issuecomment-2040974202 Thank you for the answers. > fundamentally that issue is the same Yes I think I understand that. What I'm surprised that why for that table creation call stack I don't se the SparkSessionCatalog anywhere. I mean How could the SSC handle this case when it does not have control over it. I'm missing an important piece of the puzzle here, I feel. > Namespace has to be created explicitly in Nessie as described Could the SparkSessionCatalog create it on demand? If not then I don't see how it could be used with nessie properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]
HonahX commented on code in PR #585: URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554526443 ## pyiceberg/catalog/hive.py: ## @@ -103,22 +104,6 @@ import pyarrow as pa -# Replace by visitor -hive_types = { Review Comment: This is not used so I remove it. :blush: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]
HonahX commented on code in PR #585: URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554526443 ## pyiceberg/catalog/hive.py: ## @@ -103,22 +104,6 @@ import pyarrow as pa -# Replace by visitor -hive_types = { Review Comment: This is not used so I remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]
HonahX commented on code in PR #585: URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554526443 ## pyiceberg/catalog/hive.py: ## @@ -103,22 +104,6 @@ import pyarrow as pa -# Replace by visitor -hive_types = { Review Comment: This is not used so I remove it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] [0.6.x] Backport #585 [iceberg-python]
HonahX opened a new pull request, #586: URL: https://github.com/apache/iceberg-python/pull/586 Backport #585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [Bug Fix] Allow HiveCatalog to create table with TimestamptzType [iceberg-python]
HonahX commented on code in PR #585: URL: https://github.com/apache/iceberg-python/pull/585#discussion_r1554524560 ## pyiceberg/catalog/hive.py: ## @@ -199,6 +184,7 @@ def _annotate_namespace(database: HiveDatabase, properties: Properties) -> HiveD DateType: "date", TimeType: "string", TimestampType: "timestamp", +TimestamptzType: "timestamp", Review Comment: In Java, we map `TimestamptzType` to `"timestamp with local time zone"` for Hive version >= 3 `TimestamptzType`. ```java case TIMESTAMP: Types.TimestampType timestampType = (Types.TimestampType) type; if (HiveVersion.min(HiveVersion.HIVE_3) && timestampType.shouldAdjustToUTC()) { return "timestamp with local time zone"; } return "timestamp"; ``` But in python it seems we do not have a way to check the version. So I put the "timestamp" here to ensure compatibility. Please correct me if I am wrong here. Thanks! ref: https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveSchemaUtil.java#L143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]
liurenjie1024 commented on PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#issuecomment-2040961957 Thanks @marvinlanhenke This is amazing, I'll take a review later! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]
marvinlanhenke commented on PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#issuecomment-2040961635 @liurenjie1024 @ZENOTME @Fokko PTAL and let me know what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
liurenjie1024 commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554520971 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,119 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Datum, Literal, PrimitiveType, Struct}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Box), +Type(PrimitiveType), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { +position: usize, +r#type: PrimitiveType, +inner: Option>, +} + +pub(crate) type StructAccessorRef = Arc; + +impl StructAccessor { +pub(crate) fn new(position: usize, r#type: PrimitiveType) -> Self { +StructAccessor { +position, +r#type, +inner: None, +} +} + +pub(crate) fn wrap(position: usize, inner: Box) -> Self { +StructAccessor { +position, +r#type: inner.r#type().clone(), +inner: Some(inner), +} +} + +pub(crate) fn position(&self) -> usize { +self.position +} + +pub(crate) fn r#type(&self) -> &PrimitiveType { +&self.r#type +} + +pub(crate) fn get<'a>(&'a self, container: &'a Struct) -> Datum { +match &self.inner { +None => { +let Literal::Primitive(literal) = &container[self.position] else { +panic!("Expected Literal to be Primitive"); +}; + +Datum::new(self.r#type().clone(), literal.clone()) +} +Some(inner) => { +let Literal::Struct(wrapped) = &container[self.position] else { +panic!("Nested accessor should only be wrapping a Struct"); Review Comment: Ditto. ## crates/iceberg/src/spec/schema.rs: ## @@ -137,9 +142,55 @@ impl SchemaBuilder { name_to_id, lowercase_name_to_id, id_to_name, + +field_id_to_accessor, }) } +fn build_accessors(&self) -> HashMap> { Review Comment: > Will do, re SchemaVisitor. Sorry Renjie, not sure what you mean with the second comment though? Sorry, I mean adding some unit test and example code in api to demonstrate its usage, like what python does here: https://github.com/apache/iceberg-python/blob/d3db8401cd006fcbacf1d9a2502248ce4228ff06/pyiceberg/schema.py#L1162 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,119 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Datum, Literal, PrimitiveType, Struct}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { Review Comment: Should we remove this? ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://ww
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
liurenjie1024 commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554519498 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: > I also like the idea of going further and having an Expression enum that contains the logical operators plus Predicate above, with Reference from the above being an enum over bound or unbound. This allows visitors to potentially work over both bound and unbound expressions. Hi, @sdd Would you mind to open an issue to raise a discussion about this idea? I think it looks promsing to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
liurenjie1024 commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554519334 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: > The proposed function signature fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference) would work for a unary operator. I think following signature would be applied to all operators: ```rust fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference, literals: &[Datum]) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]
gwindes commented on issue #584: URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2040825581 Further research shows that when I use [daft](https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/iceberg.html#reading-a-table) that I'm able to read and use the `to_arrow()` functionality just fine. Basic script: ``` from pyiceberg.catalog import load_catalog import daft print(f"Daft version: {daft.__version__}") warehouse_path = "./warehouse" catalog = load_catalog(**{ "uri": f"sqlite:///{warehouse_path}/pyiceberg-catalog.db", "warehouse": f"file://{warehouse_path}" }) table = catalog.load_table("A1B2.A1-301") df = daft.read_iceberg(table) print(df) print(df.to_arrow()) ``` Resulting successful output: ``` Daft version: 0.2.20 ╭──┬──┬──┬───╮ │ TEST:A1B2.RAW.ABC-GG-1-A ┆ TEST:A1B2.RAW.ABC-GG-1-B ┆ TEST:A1B2.RAW.ABC-GG-1-C ┆ time │ │ --- ┆ --- ┆ --- ┆ --- │ │ Float64 ┆ Float64 ┆ Float64 ┆ Int64 │ ╰──┴──┴──┴───╯ pyarrow.Table TEST:A1B2.RAW.ABC-GG-1-A: double TEST:A1B2.RAW.ABC-GG-1-B: double TEST:A1B2.RAW.ABC-GG-1-C: double time: int64 TEST:A1B2.RAW.ABC-GG-1-A: [[0,1,2,3,4,5,6,7,8,9]] TEST:A1B2.RAW.ABC-GG-1-B: [[0,1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9]] TEST:A1B2.RAW.ABC-GG-1-C: [[0,1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,9.9]] time: [[1702090722998897808,1702090722998947809,1702090722998997809,1702090722999047809,1702090722999097809,1702090722999147809,1702090722999197809,1702090722999247809,1702090722999297809,1702090722999347809]] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Class not found error when use sink [iceberg]
github-actions[bot] commented on issue #2455: URL: https://github.com/apache/iceberg/issues/2455#issuecomment-2040804841 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Build fails due to inaccessible palantir dependencies [iceberg]
github-actions[bot] commented on issue #2462: URL: https://github.com/apache/iceberg/issues/2462#issuecomment-2040804859 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Move site-related things to a separate repo [iceberg]
github-actions[bot] commented on issue #2446: URL: https://github.com/apache/iceberg/issues/2446#issuecomment-2040804831 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
sdd commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554441701 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: I also like the idea of going further and having an `Expression` enum that contains the logical operators plus `Predicate` above, with `Reference` from the above being an enum over bound or unbound. This allows visitors to potentially work over both bound and unbound expressions. ```rust pub enum Expression { // The top level would be Expression rather than Predicate. // The logical operators would live at this level. // Since logical operators such as And / Or / Not // are common between bound and unbound, some visitors // such as rewrite_not can be applied to both bound and // unbound expressions, preventing duplication and increasing // flexibility /// An expression always evaluates to true. AlwaysTrue, /// An expression always evaluates to false. AlwaysFalse, // Here we use a struct variant to eliminate the need // for a generic `LogicalExpression`, which did not // add much, also allowing us to refer to lhs and rhs // by name /// An expression combined by `AND`, for example, `a > 10 AND b < 20`. And { lhs: Box, rhs: Box }, /// An expression combined by `OR`, for example, `a > 10 OR b < 20`. Or { lhs: Box, rhs: Box }, /// An expression combined by `NOT`, for example, `NOT (a > 10)`. Not(Box), /// A Predicate Predicate(Predicate), } pub enum Reference { // Only at the leaf level do we distinguish between // bound and unbound. This allows things like bind // and unbind to be implemented as a transforming visitor, // and avoids duplication of logic compared to having // Bound / Unbound being distinguished at a higher level // in the type hierarchy Bound(BoundReference), Unbound(UnboundReference), } pub struct UnboundReference { name: String, } pub struct BoundReference { column_name: String, field: NestedFieldRef, accessor: StructAccessor, } ``` We could use type alises such as `type BoundExpression = Expression` and have `bind` return a `BoundExpression` to prevent bound expressions being used where unbound ones are required and vice versa. But I'm digressing very far from the original comment now, sorry! 😆 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
sdd commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1554438555 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: I don't see how this will work with just `visitor_op`. The proposed function signature `fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference)` would work for a unary operator. A binary operator would need `fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference, literal: &Datum)`, and a set operator would need `fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference, literals: &FnvHashSet)`. To resolve this, you'd need the visitor to have `fn unary_op(...)`, `fn binary_op(...)`, and `fn set_op(...)`. Is that what we want? Personally I think this is another example of where the current type hierarchy feels awkward - there are other places where we bump into these type hierarchy mismatches, such as needing a default leg in a match on operator to handle operators that will never show up, such as binary ops in a unary expression. Something more like this feels better to me: ```rust pub enum Predicate { // By not having Unary / Binary / Set here, and removing // that layer in favour of having the ops themselves // as part of this enum, there is no need for code that // raises errors when the wrong type of `Op` is used, since those // issues are prevented by the type system at compile time // Unary ops now only contain one param, the term, and so can be // implemented more simply IsNull(Reference), // ... other Unary ops here, // Binary / Set ops ditch the `BinaryExpression`, with the literal // and reference being available directly GreaterThan { literal: Datum, reference: Reference, }, // ... other Binary ops here, In { literals: HashSet, reference: Reference, }, // ... other Set ops here } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
sdd commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554425354 ## crates/iceberg/src/spec/schema.rs: ## @@ -137,9 +142,55 @@ impl SchemaBuilder { name_to_id, lowercase_name_to_id, id_to_name, + +field_id_to_accessor, }) } +fn build_accessors(&self) -> HashMap> { Review Comment: `SchemaVisitor` won't work for this without some modifications. We need to know the index within `Schema.fields` for each field to create its accessor, and `visit_struct` does not provide this at the moment. `visit_struct` and `SchemaVisitor` could be modified to pass the field index to the visitor's methods. Can we keep this PR as-is and have a TODO so that we can move on with this working as-is and I or someone else can pick up this refactor in another PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Introduce two properties for reading the connection timeout and socke… [iceberg]
harishch1998 commented on code in PR #10053: URL: https://github.com/apache/iceberg/pull/10053#discussion_r1544095514 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -78,6 +80,8 @@ public class HTTPClient implements RESTClient { private static final int REST_MAX_CONNECTIONS_DEFAULT = 100; private static final String REST_MAX_CONNECTIONS_PER_ROUTE = "rest.client.connections-per-route"; private static final int REST_MAX_CONNECTIONS_PER_ROUTE_DEFAULT = 100; + @VisibleForTesting static final String REST_CONNECTION_TIMEOUT = "rest.client.connection-timeout"; + @VisibleForTesting static final String REST_SOCKET_TIMEOUT = "rest.client.socket-timeout"; Review Comment: Yeah, these are milliseconds. I'll fix the naming. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Introduce two properties for reading the connection timeout and socke… [iceberg]
harishch1998 commented on code in PR #10053: URL: https://github.com/apache/iceberg/pull/10053#discussion_r1554368145 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -133,6 +136,59 @@ public void testDynamicHttpRequestInterceptorLoading() { assertThat(((TestHttpRequestInterceptor) interceptor).properties).isEqualTo(properties); } + @Test + public void testHttpClientGetConnectionConfig() { +long connectionTimeoutMs = 10L; +int socketTimeoutMs = 10; +Map properties = +ImmutableMap.of( +HTTPClient.REST_CONNECTION_TIMEOUT_MS, String.valueOf(connectionTimeoutMs), +HTTPClient.REST_SOCKET_TIMEOUT_MS, String.valueOf(socketTimeoutMs)); + +ConnectionConfig connectionConfig = HTTPClient.configureConnectionConfig(properties); +assertThat(connectionConfig).isNotNull(); + assertThat(connectionConfig.getConnectTimeout().getDuration()).isEqualTo(connectionTimeoutMs); + assertThat(connectionConfig.getSocketTimeout().getDuration()).isEqualTo(socketTimeoutMs); + } + + @Test + public void testHttpClientWithSocketTimeout() throws IOException { Review Comment: I tried multiple solutions but couldn't specify a custom connection timeout on the mockServer. It seems like a network-level configuration and is non-trivial to unit test with the existing MockServer that we use in testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Introduce two properties for reading the connection timeout and socke… [iceberg]
harishch1998 commented on code in PR #10053: URL: https://github.com/apache/iceberg/pull/10053#discussion_r1554368145 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -133,6 +136,59 @@ public void testDynamicHttpRequestInterceptorLoading() { assertThat(((TestHttpRequestInterceptor) interceptor).properties).isEqualTo(properties); } + @Test + public void testHttpClientGetConnectionConfig() { +long connectionTimeoutMs = 10L; +int socketTimeoutMs = 10; +Map properties = +ImmutableMap.of( +HTTPClient.REST_CONNECTION_TIMEOUT_MS, String.valueOf(connectionTimeoutMs), +HTTPClient.REST_SOCKET_TIMEOUT_MS, String.valueOf(socketTimeoutMs)); + +ConnectionConfig connectionConfig = HTTPClient.configureConnectionConfig(properties); +assertThat(connectionConfig).isNotNull(); + assertThat(connectionConfig.getConnectTimeout().getDuration()).isEqualTo(connectionTimeoutMs); + assertThat(connectionConfig.getSocketTimeout().getDuration()).isEqualTo(socketTimeoutMs); + } + + @Test + public void testHttpClientWithSocketTimeout() throws IOException { Review Comment: I tried multiple solutions but couldn't specify a custom connection timeout on the mockServer. It seems like a network-level configuration and is non-trivial to unit test the behavior with the existing MockServer that we use in testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]
gwindes opened a new issue, #584: URL: https://github.com/apache/iceberg-python/issues/584 ### Apache Iceberg version 0.6.0 (latest release) ### Please describe the bug 🐞 Related to: https://github.com/apache/iceberg-python/issues/81 Python `v3.12` ``` pyiceberg 0.6.0 pyarrow 15.0.2 pandas 2.2.1 ``` I believe this is a bug, but I may also be misunderstanding how pyiceberg and pyarrow are working with iceberg tables and thus I may be doing something wrong. However, when I sanitize the column name before writing the data to remove `: . - /` I'm able to query just fine. My understanding is that the following iceberg column name is a valid name `TEST:A1B2.RAW.ABC-GG-1-A`. With the caveat that it is NOT a nested field (which I don't need). I'm able to write the data to the iceberg table and it shows the metadata with the fully qualified name of `TEST:A1B2.RAW.ABC-GG-1-A` in the metadata json. It appears to be only fail when I want to read the data. I'm following the basic [getting started in the pyiceberg](https://py.iceberg.apache.org/#write-a-pyarrow-dataframe) ``` table = catalog.load_table("A1B2.A1-301") # neither table scan works (throws the same error): df_pyarrow = table.scan().to_arrow() df_panda = table.scan().to_pandas() ``` I created a [sample project](https://github.com/gwindes/pyiceberg-arrow-bug/blob/main/column_name_test.py) that reproduces my problem with the `pyarrow.lib.ArrowInvalid: No match for FieldRef.Name` error. Image showing metadata is storing channel name as expected. https://github.com/apache/iceberg-python/assets/843453/9d0f656a-5bf3-45f8-a4c3-5c39fbc8afa0";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]
harishch1998 commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1554343246 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -468,6 +483,19 @@ public Builder uri(String path) { return this; } +public Builder withProxy(String hostname, int port) { + Preconditions.checkNotNull(hostname, "Invalid hostname for http client proxy: null"); + this.proxy = new HttpHost(hostname, port); + return this; +} + +public Builder withProxyCredentialsProvider(CredentialsProvider credentialsProvider) { + Preconditions.checkNotNull( + credentialsProvider, "Invalid credentials provider for http client proxy: null"); + this.credentialsProvider = credentialsProvider; Review Comment: Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]
harishch1998 commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1554325691 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -121,6 +128,92 @@ public void testHeadFailure() throws JsonProcessingException { testHttpMethodOnFailure(HttpMethod.HEAD); } + /** Tests that requests go via the proxy server in case the client is set up with one */ + @Test + public void testHttpClientProxyServerInteraction() throws IOException { +int proxyPort = 1070; +String proxyHostName = "localhost"; +try (ClientAndServer proxyServer = startClientAndServer(proxyPort); +RESTClient clientWithProxy = +HTTPClient.builder(ImmutableMap.of()) +.uri(URI) +.withProxy(proxyHostName, proxyPort) +.build()) { + // Set up the servers to match against a provided request + String path = "v1/config"; + + HttpRequest mockRequest = + request("/" + path).withMethod(HttpMethod.HEAD.name().toUpperCase(Locale.ROOT)); + + HttpResponse mockResponse = response().withStatusCode(200); + + mockServer.when(mockRequest).respond(mockResponse); + proxyServer.when(mockRequest).respond(mockResponse); + + restClient.head(path, ImmutableMap.of(), (onError) -> {}); + mockServer.verify(mockRequest, VerificationTimes.exactly(1)); + proxyServer.verify(mockRequest, VerificationTimes.never()); Review Comment: Makes sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]
harishch1998 commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1554322900 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -468,6 +483,19 @@ public Builder uri(String path) { return this; } +public Builder withProxy(String hostname, int port) { + Preconditions.checkNotNull(hostname, "Invalid hostname for http client proxy: null"); + this.proxy = new HttpHost(hostname, port); + return this; +} + +public Builder withProxyCredentialsProvider(CredentialsProvider credentialsProvider) { + Preconditions.checkNotNull( + credentialsProvider, "Invalid credentials provider for http client proxy: null"); + this.credentialsProvider = credentialsProvider; Review Comment: So I also thought of failing the build in such cases but another option was to simply end up in a no-op if a proxyCredentialsProvider is specified but not the proxyServer. I ended up doing the latter in the private constructor. ``` if (proxy != null) { if (credsProvider != null) { clientBuilder.setDefaultCredentialsProvider(credsProvider); } clientBuilder.setProxy(proxy); } ``` But I'm happy to add an additional notNull check. Will fix the naming. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Hive Catalog cannot create table with TimestamptzType field [iceberg-python]
HonahX opened a new issue, #583: URL: https://github.com/apache/iceberg-python/issues/583 ### Apache Iceberg version 0.6.0 (latest release) ### Please describe the bug 🐞 This bug was reported by Jorge Arada on slack: https://apache-iceberg.slack.com/archives/C029EE6HQ5D/p1712309474600169 Reported example to reproduce: ```python catalog = load_hive("iceberg_catalog", {"uri": "some_url"}) fields = [NestedField( field_id=0, name="teste_tz", field_type=TimestamptzType(), required=False)] schema = Schema(*fields) catalog.create_table(identifier="raw.test_table", schema=schema) ``` Error: ``` Traceback (most recent call last): File "", line 1, in File "/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py", line 315, in create_table sd=_construct_hive_storage_descriptor(schema, location), File "/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py", line 155, in _construct_hive_storage_descriptor [FieldSchema(field.name, visit(field.field_type, SchemaToHiveConverter()), field.doc) for field in schema.fields], ^ File "/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py", line 155, in [FieldSchema(field.name, visit(field.field_type, SchemaToHiveConverter()), field.doc) for field in schema.fields], File "/usr/local/lib/python3.11/functools.py", line 909, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/schema.py", line 847, in _ return visitor.primitive(obj) ^^ File "/home/revolut/app/.venv/lib/python3.11/site-packages/pyiceberg/catalog/hive.py", line 228, in primitive return HIVE_PRIMITIVE_TYPES[type(primitive)] ^ KeyError: ``` It seems we miss the TimestampzTypein the [Iceberg to Hive type conversion map](https://github.com/apache/iceberg-python/blob/a892309936effa7ec575195ad3be70193e82d704/pyiceberg/catalog/hive.py#L193). A similar issue has been observed and fixed in glue: https://github.com/apache/iceberg-python/pull/366 Based on discussion, it would be good to include the fix in 0.6.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Validate overwrite filter [iceberg-python]
jqin61 opened a new pull request, #582: URL: https://github.com/apache/iceberg-python/pull/582 **Background** When we are doing a static overwrite, we could choose to overwrite the full table or overwrite some partitions of the table. Example spark sql counterpart in [iceberg spark static overwrite](https://iceberg.apache.org/docs/latest/spark-writes/#static-overwrite) for full table overwrite is ``` INSERT OVERWRITE prod.my_app.logs SELECT uuid, first(level), first(ts), first(message) FROM prod.my_app.logs WHERE level = 'INFO' GROUP BY uuid ``` And example spark sql counterpart for specified partition overwrite is ``` INSERT OVERWRITE prod.my_app.logs SELECT uuid, first(level), first(ts), first(message) PARTITION (level = 'INFO') FROM prod.my_app.logs WHERE level = 'INFO' GROUP BY uuid ``` **Goal** When we overwrite the table, we could provide an expression as overwrite_filter in accord with these 2 cases. This pr is to validate that the filter expression should conform to certain rules such as it has to be on a partition column that does not use hidden partitioning and the fields in the filter have to be in accord with the input arrow table in a certain way so that the dataframe does not include values that the filter does not specify. **Rules and Test Cases** 1. **Rule :** The expression could only use **IsNull** or **EqualTo** as building blocks and concatenated by **And** **Tests :** `test__validate_static_overwrite_filter_expr_type` parametrize 1-8 2. **Rule :** The building block predicates (IsNull and EqualTo) should not have conflicting values. **Tests :** `test__validate_static_overwrite_filter_expr_type` parametrize 9-11 3. **Rule :** The terms (fields) should refer to existing fields in the iceberg schema, and also the literal in the predicate (if any) should match the iceberg field type. These mean the expression could be bound with table schema successfully. **Tests :** `test__bind_and_validate_static_overwrite_filter_predicate_fails_on_non_schema_fields_in_filter` `test__bind_and_validate_static_overwrite_filter_predicate_fails_to_bind_due_to_incompatible_predicate_value` 4. **Rule :** If expression specifies a field which is required in iceberg schema, it should not be isnull in the expression. **Tests :** `test__bind_and_validate_static_overwrite_filter_predicate_fails_to_bind_due_to_non_nullable` 5. **Rule :** The fields in the expression should be within partition columns **Tests :** `test__bind_and_validate_static_overwrite_filter_predicate_fails_on_non_part_fields_in_filter` 6. **Rule :** The iceberg table fields specified in the expression could not have hidden partitioning, however, the non-specified fields could. **Tests :** `test__bind_and_validate_static_overwrite_filter_predicate_fails_on_non_identity_transorm_filter` `test__bind_and_validate_static_overwrite_filter_predicate_succeeds_on_an_identity_transform_field_although_table_has_other_hidden_partition_fields` 7. **Rule :** The partition column values in the dataframe should conform to the filter. (To implement in the static overwrite function when partion keys are extracted) **Rule Necessity Justification using Spark Counterparts** To better understand these rules, let us provide spark static overwrite crash counterparts. For which, we have following set up: ``` # Create Spark Dataframe from pyspark.sql.types import StructType, StructField, StringType, LongType data_multicols = [(2, "Flamingo", "red"), (4, "Horse", "white"), (4, "Pig", "pink")] schema = StructType([ StructField("n_legs", LongType(), nullable=True), StructField("animals", StringType(), nullable=True), StructField("color", StringType(), nullable=True) # Mark as non-nullable ]) df_multicols = spark.createDataFrame(data_multicols, schema) # Create Iceberg Table create_sql = """CREATE TABLE lacus.test.spark_staticoverwrite_partition_clause_and_data_reltship_multicols ( n_legs bigint, animals string, color string) USING iceberg PARTITIONED BY (n_legs, color) """ spark.sql(create_sql) # Insert Initial data df_multicols.createOrReplaceTempView("tmp_view") sql_cmd = f"""INSERT INTO lacus.test.spark_staticoverwrite_partition_clause_and_data_reltship_multicols SELECT * FROM tmp_view """ spark.sql(sql_cmd) ``` this gives such table schema: | col_name|data_type|comment| |--|-|---| |n_legs| bigint| | | animals| string| | | color| string| | | Partitioning| | | |--|-|---| |Part 0| n_legs| | |Part 1|color| | with such data:
Re: [I] Support client-side purge in REST catalog [iceberg]
flyrain commented on issue #10089: URL: https://github.com/apache/iceberg/issues/10089#issuecomment-2040599592 That's right. In our case, the rest server cannot access all table files due to following reasons: 1. The rest catalog or any other catalog isn't allowed to access users' data due to the security policy, metadata access is fine. 2. Some Iceberg tables are in HDFS with kerberos, which makes them pretty hard to access from a centralized server. We still write metadata.json files, but they are located in a server-side storage instead of users' table storage. I understand this use case is a bit different from where the REST catalog was introduced, but I believe it is a valid use case, and we extend the scope of rest catalog a bit more to support it. cc @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support client-side purge in REST catalog [iceberg]
danielcweeks commented on issue #10089: URL: https://github.com/apache/iceberg/issues/10089#issuecomment-2040535320 @flyrain I'm a little confused, how can the REST Server not have access to the files? Currently the server needs access to at least the metadata files. Are you considering a situations where data files and metadata files are protected separately? The way we've been thinking about REST puts the responsibility of the delete on the server (the client shouldn't be responsible for how or when the delete happens). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]
singhpk234 commented on issue #10064: URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040485782 seems like we always get default config setting in presto https://github.com/prestodb/presto/blob/c5674d6e7c1008eb7935542a836f8f90d6bf09c6/presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergParquetFileWriter.java#L74C1-L74C194 even for deletes we need to have special handling here i think :), if i am not mistaken it would be interesting to see if it causes regression for already released version of presto which are being used as a writer it would just end up reading more data i guess , not sure though -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
sdd commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554175122 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), +Type(Type), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { +position: i32, +inner_or_type: InnerOrType, +} + +pub(crate) type StructAccessorRef = Arc; + +impl StructAccessor { +pub(crate) fn new(position: i32, r#type: Type) -> Self { +StructAccessor { +position, +inner_or_type: InnerOrType::Type(r#type), +} +} + +pub(crate) fn wrap(position: i32, inner: StructAccessorRef) -> Self { +StructAccessor { +position, +inner_or_type: InnerOrType::Inner(inner), +} +} + +pub fn position(&self) -> i32 { +self.position +} + +fn r#type(&self) -> &Type { +match &self.inner_or_type { +InnerOrType::Inner(inner) => inner.r#type(), +InnerOrType::Type(r#type) => r#type, +} +} + +fn get<'a>(&'a self, container: &'a Struct) -> &Literal { Review Comment: Ok, will update ## crates/iceberg/src/spec/values.rs: ## @@ -1141,6 +1141,11 @@ impl Struct { }, ) } + +/// Gets a ref to the field at `position` within the `Struct` +pub fn get(&self, position: i32) -> &Literal { Review Comment: Good idea! Will do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
sdd commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554176015 ## crates/iceberg/src/spec/schema.rs: ## @@ -137,9 +142,55 @@ impl SchemaBuilder { name_to_id, lowercase_name_to_id, id_to_name, + +field_id_to_accessor, }) } +fn build_accessors(&self) -> HashMap> { Review Comment: Will do, re SchemaVisitor. Sorry Renjie, not sure what you mean with the second comment though? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
sdd commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554174651 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), Review Comment: Wil do! ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), Review Comment: Will do! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]
singhpk234 commented on issue #10064: URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040468114 are you using presto as a writer as well ? if you check it's still 16 characters so the writer which is generating the delete files is generating wrong metrics i.e still using truncate(16). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
sdd commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554171392 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), +Type(Type), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { Review Comment: This was deliberate - in the Java implementation, `Accessor` is an interface and generic over `T`, with T usually being `StructLike`, and `StructLike` is itself an interface. Since we only have `Struct` at the moment , rather than several structs that implement a `StructLike` trait, I went for `StructAccessor`, since ours is closer to `Accessor` in the Java implementation, and this allows us to implement a more generic `Accessor` at some point in the future alongside `StructAccessor`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]
agrawalreetika commented on issue #10064: URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040458075 Thank you for your response @singhpk234 So we are using iceberg version 1.4.3 in Presto. And I checked https://github.com/apache/iceberg/pull/6313 looks like it's fixed for Flink, so it wouldn't work for others? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
sdd commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554163204 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), +Type(Type), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { +position: i32, +inner_or_type: InnerOrType, Review Comment: That duplicates the type at each level, but you have a point, we're optimising for time efficiency rather than space. Will fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
sdd commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1554161905 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), +Type(Type), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { +position: i32, Review Comment: Sure, will change. I saw we'd used `i32` almost everywhere else so went with that but `usize` works better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] zorder does not work with sub fields [iceberg]
cccs-jc commented on issue #10017: URL: https://github.com/apache/iceberg/issues/10017#issuecomment-2040368228 Seems like it would. I'm not a reviewer but I do want to the fix :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add explicit JSON parser for ConfigResponse [iceberg]
nastra merged PR #9952: URL: https://github.com/apache/iceberg/pull/9952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add explicit JSON parser for ConfigResponse [iceberg]
nastra commented on PR #9952: URL: https://github.com/apache/iceberg/pull/9952#issuecomment-2040195604 thanks for the review @amogh-jahagirdar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Change DataScan to accept Metadata and io [iceberg-python]
Fokko opened a new pull request, #581: URL: https://github.com/apache/iceberg-python/pull/581 For the partial deletes I want to do a scan on in memory metadata. Changing this API allows this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] In case of Positional Deletes "file_path" in lowerbound & upperbound do not have full file_path Buffer [iceberg]
singhpk234 commented on issue #10064: URL: https://github.com/apache/iceberg/issues/10064#issuecomment-2040031695 which iceberg version ? seems like the file_path metrics were truncated : > This is because these metrics were truncated, Iceberg's default metrics mode for column metric is truncate(16). This should be fixed by https://github.com/apache/iceberg/pull/6313. more on this via : https://github.com/apache/iceberg/issues/6694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add metadata tables [iceberg-python]
syun64 commented on issue #511: URL: https://github.com/apache/iceberg-python/issues/511#issuecomment-2040023902 Hi @Fokko and folks, we are interested in implementing the Partitions metadata table for our use case if it hasn't been picked up already -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add explicit JSON parser for ConfigResponse [iceberg]
amogh-jahagirdar commented on code in PR #9952: URL: https://github.com/apache/iceberg/pull/9952#discussion_r1553791769 ## core/src/main/java/org/apache/iceberg/rest/responses/ConfigResponseParser.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest.responses; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.databind.JsonNode; +import java.io.IOException; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.util.JsonUtil; + +public class ConfigResponseParser { + + private static final String DEFAULTS = "defaults"; + private static final String OVERRIDES = "overrides"; + + private ConfigResponseParser() {} + + public static String toJson(ConfigResponse response) { +return toJson(response, false); + } + + public static String toJson(ConfigResponse response, boolean pretty) { +return JsonUtil.generate(gen -> toJson(response, gen), pretty); + } + + public static void toJson(ConfigResponse response, JsonGenerator gen) throws IOException { +Preconditions.checkArgument(null != response, "Invalid config response: null"); + +gen.writeStartObject(); + +JsonUtil.writeStringMap(DEFAULTS, response.defaults(), gen); Review Comment: Should we just change this to not send back the key in the response in the value is null (makes for a cleaner response? I don't really think that's a behavior change since in the end a client needs to check that the value is not null. ## core/src/main/java/org/apache/iceberg/rest/responses/ConfigResponseParser.java: ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest.responses; + +import com.fasterxml.jackson.core.JsonGenerator; +import com.fasterxml.jackson.databind.JsonNode; +import java.io.IOException; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.util.JsonUtil; + +public class ConfigResponseParser { + + private static final String DEFAULTS = "defaults"; + private static final String OVERRIDES = "overrides"; + + private ConfigResponseParser() {} + + public static String toJson(ConfigResponse response) { +return toJson(response, false); + } + + public static String toJson(ConfigResponse response, boolean pretty) { +return JsonUtil.generate(gen -> toJson(response, gen), pretty); + } + + public static void toJson(ConfigResponse response, JsonGenerator gen) throws IOException { +Preconditions.checkArgument(null != response, "Invalid config response: null"); + +gen.writeStartObject(); + +JsonUtil.writeStringMap(DEFAULTS, response.defaults(), gen); Review Comment: NVM these two fields are spec'd out so it seems like servers should always send it even though it may be null https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L83 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additi
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
marvinlanhenke commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553740208 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: > I think this is the advantage of the op approach, it's left to user to decide what to do. If they want new op to be ignored, they could use a default match arm, otherwise they could choose to do pattern match against each operator. Yes, exactly. So I'm +1 for the op approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
liurenjie1024 commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553653540 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: > ...wouldn't I have match op anyway, so having a default match arm in the implementation would solve the issue for the downstream user? I think this is the advantage of the op approach, it's left to user to decide what to do. If they want new op to be ignored, they could use a default match arm, otherwise they could choose to do pattern match against each operator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add option to delete datafiles [iceberg-python]
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1553588970 ## pyiceberg/table/__init__.py: ## @@ -2726,6 +2731,112 @@ def _commit(self) -> UpdatesAndRequirements: ) +class DeleteFiles(_MergingSnapshotProducer): +_predicate: BooleanExpression + +def __init__( +self, +operation: Operation, +transaction: Transaction, +io: FileIO, +commit_uuid: Optional[uuid.UUID] = None, +snapshot_properties: Dict[str, str] = EMPTY_DICT, +): +super().__init__(operation, transaction, io, commit_uuid, snapshot_properties) +self._predicate = AlwaysFalse() + +def _build_partition_projection(self, spec_id: int) -> BooleanExpression: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +project = inclusive_projection(schema, spec) +return project(self._predicate) + +@cached_property +def partition_filters(self) -> KeyDefaultDict[int, BooleanExpression]: +return KeyDefaultDict(self._build_partition_projection) + +def _build_manifest_evaluator(self, spec_id: int) -> Callable[[ManifestFile], bool]: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +return manifest_evaluator(spec, schema, self.partition_filters[spec_id], case_sensitive=True) + +def _build_partition_evaluator(self, spec_id: int) -> Callable[[DataFile], bool]: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +partition_type = spec.partition_type(schema) +partition_schema = Schema(*partition_type.fields) +partition_expr = self.partition_filters[spec_id] + +return lambda data_file: expression_evaluator(partition_schema, partition_expr, case_sensitive=True)(data_file.partition) + +def delete(self, predicate: BooleanExpression) -> None: +self._predicate = Or(self._predicate, predicate) + +@cached_property +def _compute_deletes(self) -> Tuple[List[ManifestFile], List[ManifestEntry]]: +schema = self._transaction.table_metadata.schema() + +def _copy_with_new_status(entry: ManifestEntry, status: ManifestEntryStatus) -> ManifestEntry: +return ManifestEntry( +status=status, +snapshot_id=entry.snapshot_id, +data_sequence_number=entry.data_sequence_number, +file_sequence_number=entry.file_sequence_number, +data_file=entry.data_file, +) + +manifest_evaluators: Dict[int, Callable[[ManifestFile], bool]] = KeyDefaultDict(self._build_manifest_evaluator) +strict_metrics_evaluator = _StrictMetricsEvaluator(schema, self._predicate, case_sensitive=True).eval Review Comment: Great suggestion :) I would like to do that in a next iteration since that's an optimization. We don't delete whole manifest files in Java, and would like to understand why we don't do this before implementing this here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add option to delete datafiles [iceberg-python]
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1553580296 ## pyiceberg/table/__init__.py: ## @@ -2726,6 +2731,112 @@ def _commit(self) -> UpdatesAndRequirements: ) +class DeleteFiles(_MergingSnapshotProducer): +_predicate: BooleanExpression + +def __init__( +self, +operation: Operation, +transaction: Transaction, +io: FileIO, +commit_uuid: Optional[uuid.UUID] = None, +snapshot_properties: Dict[str, str] = EMPTY_DICT, +): +super().__init__(operation, transaction, io, commit_uuid, snapshot_properties) +self._predicate = AlwaysFalse() + +def _build_partition_projection(self, spec_id: int) -> BooleanExpression: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +project = inclusive_projection(schema, spec) +return project(self._predicate) + +@cached_property +def partition_filters(self) -> KeyDefaultDict[int, BooleanExpression]: +return KeyDefaultDict(self._build_partition_projection) + +def _build_manifest_evaluator(self, spec_id: int) -> Callable[[ManifestFile], bool]: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +return manifest_evaluator(spec, schema, self.partition_filters[spec_id], case_sensitive=True) + +def _build_partition_evaluator(self, spec_id: int) -> Callable[[DataFile], bool]: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +partition_type = spec.partition_type(schema) +partition_schema = Schema(*partition_type.fields) +partition_expr = self.partition_filters[spec_id] + +return lambda data_file: expression_evaluator(partition_schema, partition_expr, case_sensitive=True)(data_file.partition) + +def delete(self, predicate: BooleanExpression) -> None: +self._predicate = Or(self._predicate, predicate) + +@cached_property +def _compute_deletes(self) -> Tuple[List[ManifestFile], List[ManifestEntry]]: +schema = self._transaction.table_metadata.schema() + +def _copy_with_new_status(entry: ManifestEntry, status: ManifestEntryStatus) -> ManifestEntry: +return ManifestEntry( +status=status, +snapshot_id=entry.snapshot_id, +data_sequence_number=entry.data_sequence_number, +file_sequence_number=entry.file_sequence_number, +data_file=entry.data_file, +) + +manifest_evaluators: Dict[int, Callable[[ManifestFile], bool]] = KeyDefaultDict(self._build_manifest_evaluator) +strict_metrics_evaluator = _StrictMetricsEvaluator(schema, self._predicate, case_sensitive=True).eval +inclusive_metrics_evaluator = _InclusiveMetricsEvaluator(schema, self._predicate, case_sensitive=True).eval + +existing_manifests = [] +total_deleted_entries = [] +if snapshot := self._transaction.table_metadata.current_snapshot(): +for num, manifest_file in enumerate(snapshot.manifests(io=self._io)): +if not manifest_evaluators[manifest_file.partition_spec_id](manifest_file): +# If the manifest isn't relevant, we can just keep it in the manifest-list +existing_manifests.append(manifest_file) +else: +# It is relevant, let's check out the content +deleted_entries = [] +existing_entries = [] +for entry in manifest_file.fetch_manifest_entry(io=self._io): +if strict_metrics_evaluator(entry.data_file) == ROWS_MUST_MATCH: + deleted_entries.append(_copy_with_new_status(entry, ManifestEntryStatus.DELETED)) +elif inclusive_metrics_evaluator(entry.data_file) == ROWS_CANNOT_MATCH: + existing_entries.append(_copy_with_new_status(entry, ManifestEntryStatus.EXISTING)) +else: +raise ValueError("Deletes do not support rewrites of data files") + +if len(deleted_entries) > 0: +total_deleted_entries += deleted_entries + +# Rewrite the manifest +if len(existing_entries) > 0: +output_file_location = _new_manifest_path( + location=self._transaction.table_metadata.location, num=num, commit_uuid=self.commit_uuid +) +with write_manifest( + format_version=self._transaction.table_metadata.format_version, + spec=se
Re: [PR] Add option to delete datafiles [iceberg-python]
Fokko commented on code in PR #569: URL: https://github.com/apache/iceberg-python/pull/569#discussion_r1553580296 ## pyiceberg/table/__init__.py: ## @@ -2726,6 +2731,112 @@ def _commit(self) -> UpdatesAndRequirements: ) +class DeleteFiles(_MergingSnapshotProducer): +_predicate: BooleanExpression + +def __init__( +self, +operation: Operation, +transaction: Transaction, +io: FileIO, +commit_uuid: Optional[uuid.UUID] = None, +snapshot_properties: Dict[str, str] = EMPTY_DICT, +): +super().__init__(operation, transaction, io, commit_uuid, snapshot_properties) +self._predicate = AlwaysFalse() + +def _build_partition_projection(self, spec_id: int) -> BooleanExpression: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +project = inclusive_projection(schema, spec) +return project(self._predicate) + +@cached_property +def partition_filters(self) -> KeyDefaultDict[int, BooleanExpression]: +return KeyDefaultDict(self._build_partition_projection) + +def _build_manifest_evaluator(self, spec_id: int) -> Callable[[ManifestFile], bool]: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +return manifest_evaluator(spec, schema, self.partition_filters[spec_id], case_sensitive=True) + +def _build_partition_evaluator(self, spec_id: int) -> Callable[[DataFile], bool]: +schema = self._transaction.table_metadata.schema() +spec = self._transaction.table_metadata.specs()[spec_id] +partition_type = spec.partition_type(schema) +partition_schema = Schema(*partition_type.fields) +partition_expr = self.partition_filters[spec_id] + +return lambda data_file: expression_evaluator(partition_schema, partition_expr, case_sensitive=True)(data_file.partition) + +def delete(self, predicate: BooleanExpression) -> None: +self._predicate = Or(self._predicate, predicate) + +@cached_property +def _compute_deletes(self) -> Tuple[List[ManifestFile], List[ManifestEntry]]: +schema = self._transaction.table_metadata.schema() + +def _copy_with_new_status(entry: ManifestEntry, status: ManifestEntryStatus) -> ManifestEntry: +return ManifestEntry( +status=status, +snapshot_id=entry.snapshot_id, +data_sequence_number=entry.data_sequence_number, +file_sequence_number=entry.file_sequence_number, +data_file=entry.data_file, +) + +manifest_evaluators: Dict[int, Callable[[ManifestFile], bool]] = KeyDefaultDict(self._build_manifest_evaluator) +strict_metrics_evaluator = _StrictMetricsEvaluator(schema, self._predicate, case_sensitive=True).eval +inclusive_metrics_evaluator = _InclusiveMetricsEvaluator(schema, self._predicate, case_sensitive=True).eval + +existing_manifests = [] +total_deleted_entries = [] +if snapshot := self._transaction.table_metadata.current_snapshot(): +for num, manifest_file in enumerate(snapshot.manifests(io=self._io)): +if not manifest_evaluators[manifest_file.partition_spec_id](manifest_file): +# If the manifest isn't relevant, we can just keep it in the manifest-list +existing_manifests.append(manifest_file) +else: +# It is relevant, let's check out the content +deleted_entries = [] +existing_entries = [] +for entry in manifest_file.fetch_manifest_entry(io=self._io): +if strict_metrics_evaluator(entry.data_file) == ROWS_MUST_MATCH: + deleted_entries.append(_copy_with_new_status(entry, ManifestEntryStatus.DELETED)) +elif inclusive_metrics_evaluator(entry.data_file) == ROWS_CANNOT_MATCH: + existing_entries.append(_copy_with_new_status(entry, ManifestEntryStatus.EXISTING)) +else: +raise ValueError("Deletes do not support rewrites of data files") + +if len(deleted_entries) > 0: +total_deleted_entries += deleted_entries + +# Rewrite the manifest +if len(existing_entries) > 0: +output_file_location = _new_manifest_path( + location=self._transaction.table_metadata.location, num=num, commit_uuid=self.commit_uuid +) +with write_manifest( + format_version=self._transaction.table_metadata.format_version, + spec=se
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nastra merged PR #10082: URL: https://github.com/apache/iceberg/pull/10082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nastra commented on code in PR #10082: URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553435328 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveClientPool.java: ## @@ -116,6 +119,52 @@ public void testGetTablesFailsForNonReconnectableException() throws Exception { .hasMessage("Another meta exception"); } + @Test + public void testExceptionMessages() { +// Test Wrapped MetaException with a message Review Comment: I don't think those comments are helpful. I would just remove all of those -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nk1506 commented on code in PR #10082: URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553430668 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveClientPool.java: ## @@ -116,6 +119,52 @@ public void testGetTablesFailsForNonReconnectableException() throws Exception { .hasMessage("Another meta exception"); } + @Test + public void testExceptionMessages() { +// Test Wrapped MetaException with a message +try (MockedStatic mockedStatic = Mockito.mockStatic(MetaStoreUtils.class)) { Review Comment: Couldn't find a way to mock calling method ``` return GET_CLIENT.invoke( hiveConf, (HiveMetaHookLoader) tbl -> null, HiveMetaStoreClient.class.getName()); ``` and throw Exception. Had to trace and mock the class which is throwing error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nastra commented on code in PR #10082: URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553429104 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -970,6 +975,47 @@ public void testCatalogWithCustomMetricsReporter() throws IOException { Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2); } + @Test + public void testCommitExceptionWithoutMessage() { +TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl"); +BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); +JdbcTableOperations ops = (JdbcTableOperations) table.operations(); Review Comment: no need for the casting here. See the simplified test version in my previous comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
marvinlanhenke commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553297280 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: > Another option is to use following method: > > ```rust > trait BoundPredicateVisitor { > fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference, results: Vec) -> Result {} > } > ``` I think I'm in favor of keeping the trait as generic as possible - so we don't have to modify it, if we have to support a new operator. When implementing this trait (no matter which variant) I'd most likely have to overwrite the default behaviour anyway? So I wouldn't bother providing any in the first place and keep the trait as generic and future proof as possible? If that makes any sense to you? >This way we can force user to think about added new operator, otherwise it will throw compiler error, but this may break things when we add new operator. ...wouldn't I have match `op` anyway, so having a default match arm in the implementation would solve the issue for the downstream user? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
marvinlanhenke commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553297280 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: > Another option is to use following method: > > ```rust > trait BoundPredicateVisitor { > fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference, results: Vec) -> Result {} > } > ``` I think I'm in favor of keeping the trait as generic as possible - so we don't have to modify it, if we have to support a new operator. When implementing this trait (no matter which variant) I'd most likely have to overwrite the default behaviour anyway? So I wouldn't bother providing any in the first place and keep the trait as generic and future proof as possible? If that makes any sense to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Implement `project` for `Transform`. [iceberg-rust]
liurenjie1024 closed issue #264: Implement `project` for `Transform`. URL: https://github.com/apache/iceberg-rust/issues/264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 merged PR #309: URL: https://github.com/apache/iceberg-rust/pull/309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
marvinlanhenke commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1553285045 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), +Type(Type), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { +position: i32, +inner_or_type: InnerOrType, +} + +pub(crate) type StructAccessorRef = Arc; + +impl StructAccessor { +pub(crate) fn new(position: i32, r#type: Type) -> Self { +StructAccessor { +position, +inner_or_type: InnerOrType::Type(r#type), +} +} + +pub(crate) fn wrap(position: i32, inner: StructAccessorRef) -> Self { +StructAccessor { +position, +inner_or_type: InnerOrType::Inner(inner), +} +} + +pub fn position(&self) -> i32 { +self.position +} + +fn r#type(&self) -> &Type { +match &self.inner_or_type { +InnerOrType::Inner(inner) => inner.r#type(), +InnerOrType::Type(r#type) => r#type, +} +} + +fn get<'a>(&'a self, container: &'a Struct) -> &Literal { Review Comment: > Should we return `Datum` here? This better follows python/java implementation. unless we explicitly need `Literal` somewhere downstream (haven't checked python/java yet). I'd definitely +1 for returning `Datum` here, which should be easier to handle downstream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
liurenjie1024 commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553279624 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: > If we add default implementations that return something like Err(OperatorNotImplemented), then if we add an operator, wouldn't existing code work for existing operators and only break if users then try to use the new operator without having updated any visitors they have to use the new operator? Yes, I agree with that. Another option is to use following method: ```rust trait BoundPredicateVisitor { fn visitor_op(&mut self, op: &PredicateOperator, reference: &BoundReference, results: Vec) -> Result {} } ``` This way we can force user to think about added new operator, otherwise it will throw compiler error, but this may break things when we add new operator. I think the two approaches have different pros and cons, let's wait to see others's comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2039348517 > Hi, @marvinlanhenke Thanks for pr, it looks great! I have some small suggestion to restructure the code to make it easier for review. Really greatful for these tests! Thanks for the review - I did some minor fixes according to your suggestions. If no other comments, I think we're good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553275061 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(&self, name: String, predicate: &BoundPredicate) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, &func), +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(&self, datum: &Datum) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(&Type::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( Review Comment: I missed that, we don't need a Generic. Fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553272539 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(&self, name: String, predicate: &BoundPredicate) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, &func), +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(&self, datum: &Datum) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(&Type::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( +&self, +name: String, +expr: &BinaryExpression, +func: &BoxedTransformFunction, +) -> Result> { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, + +} + +/// Projects a binary expression to a predicate with an adjusted boundary. +/// +/// Checks if the literal within the given binary expression is +/// transformable. If transformable, it proceeds to potentially adjust +/// the boundary of the expression based on the comparison operator (`op`). +/// The potential adjustments involve incrementing or decrementing the +/// literal value and changing the `PredicateOperator` itself to its +/// inclusive variant. +fn project_
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553271703 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(&self, name: String, predicate: &BoundPredicate) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, &func), +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(&self, datum: &Datum) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(&Type::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( +&self, +name: String, +expr: &BinaryExpression, +func: &BoxedTransformFunction, +) -> Result> { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, + +} + +/// Projects a binary expression to a predicate with an adjusted boundary. +/// +/// Checks if the literal within the given binary expression is +/// transformable. If transformable, it proceeds to potentially adjust +/// the boundary of the expression based on the comparison operator (`op`). +/// The potential adjustments involve incrementing or decrementing the +/// literal value and changing the `PredicateOperator` itself to its +/// inclusive variant. +fn project_
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553262971 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(&self, name: String, predicate: &BoundPredicate) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, &func), +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, &func, None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, &func), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(&self, datum: &Datum) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(&Type::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( +&self, +name: String, +expr: &BinaryExpression, +func: &BoxedTransformFunction, +) -> Result> { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, + +} + +/// Projects a binary expression to a predicate with an adjusted boundary. +/// +/// Checks if the literal within the given binary expression is +/// transformable. If transformable, it proceeds to potentially adjust +/// the boundary of the expression based on the comparison operator (`op`). +/// The potential adjustments involve incrementing or decrementing the +/// literal value and changing the `PredicateOperator` itself to its +/// inclusive variant. +fn project_
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
sdd commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553253806 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: If we add default implementations that return something like `Err(OperatorNotImplemented)`, then if we add an operator, wouldn't existing code work for existing operators and only break if users then try to use the new operator without having updated any visitors they have to use the new operator? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nastra commented on code in PR #10082: URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553236192 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws IOException { Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2); } + @Test + public void testCommitExceptionWithoutMessage() { Review Comment: can be simplified to ``` @Test public void testCommitExceptionWithoutMessage() { TableIdentifier tableIdent = TableIdentifier.of("db", "tbl"); BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); TableOperations ops = table.operations(); TableMetadata metadataV1 = ops.current(); table.updateSchema().addColumn("n", Types.IntegerType.get()).commit(); ops.refresh(); try (MockedStatic mockedStatic = Mockito.mockStatic(JdbcUtil.class)) { mockedStatic .when(() -> JdbcUtil.loadTable(any(), any(), any(), any())) .thenThrow(new SQLException()); assertThatThrownBy(() -> ops.commit(ops.current(), metadataV1)) .isInstanceOf(UncheckedSQLException.class) .hasMessageStartingWith("Unknown failure"); } } ``` please also update the other tests accordingly. The spying on `ops` isn't necessary here, because you're throwing the exception when the table/view is being loaded (which is different from https://github.com/apache/iceberg/pull/10069) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
liurenjie1024 commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553220669 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn always_true(&mut self) -> Result; +fn always_false(&mut self) -> Result; + +fn and(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn or(&mut self, lhs: Self::T, rhs: Self::T) -> Result; +fn not(&mut self, inner: Self::T) -> Result; + +fn is_null(&mut self, reference: &BoundReference) -> Result; Review Comment: The problem with this approach is that when we add an operator, we need to modify the trait. We can still keep it compatible with default function, but the actual behavior may break when user upgrades library. cc @viirya @Xuanwo @marvinlanhenke What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
liurenjie1024 commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553215173 ## crates/iceberg/src/expr/predicate.rs: ## @@ -217,6 +249,10 @@ impl Display for SetExpression { /// Unbound predicate expression before binding to a schema. #[derive(Debug, PartialEq)] pub enum Predicate { +/// AlwaysTrue predicate, for example, `TRUE`. +AlwaysTrue, +/// AlwaysFalse predicate, for example, `FALSE`. +AlwaysFalse, Review Comment: Reason to me, thanks for the explaination. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]
liurenjie1024 commented on code in PR #317: URL: https://github.com/apache/iceberg-rust/pull/317#discussion_r1553151792 ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), +Type(Type), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { +position: i32, +inner_or_type: InnerOrType, Review Comment: This seems inefficient to me, each time we call `r#type`, it needs to recursively call inners for nested field. How about just following: ```rust r#type: Type, inner: Option ``` ## crates/iceberg/src/spec/schema.rs: ## @@ -137,9 +142,55 @@ impl SchemaBuilder { name_to_id, lowercase_name_to_id, id_to_name, + +field_id_to_accessor, }) } +fn build_accessors(&self) -> HashMap> { Review Comment: How about using SchemaVisitor to build this? ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), +Type(Type), +} + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +pub struct StructAccessor { Review Comment: How about just name it `Accessor`? This is in fact field accessor. ## crates/iceberg/src/expr/accessor.rs: ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::spec::{Literal, Struct, Type}; +use serde_derive::{Deserialize, Serialize}; +use std::sync::Arc; + +#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)] +enum InnerOrType { +Inner(Arc), Review Comment: ```suggestion Inner(Box), ``` I think the inner accessor is not shared, so it would be more efficient to use `Box` ## crates/iceberg/src/spec/schema.rs: ## @@ -137,9 +142,55 @@ impl SchemaBuilder { name_to_id, lowercase_name_to_id, id_to_name, + +field_id_to_accessor, }) } +fn build_accessors(&self) -> HashMap> { Review Comment: Also it would be better to add some uts to show the result. ## crates/iceberg/src/spec/values.rs: ## @@ -1141,6 +1141,11 @@ impl St
Re: [I] Support creating tags [iceberg-python]
Fokko commented on issue #573: URL: https://github.com/apache/iceberg-python/issues/573#issuecomment-2039233700 @enkidulan Thanks for reaching out here. Are you interested in creating the API for this? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nastra commented on code in PR #10082: URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553049382 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws IOException { Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2); } + @Test + public void testCommitExceptionWithoutMessage() { +TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl"); +BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); +JdbcTableOperations ops = (JdbcTableOperations) table.operations(); +TableMetadata metadataV1 = ops.current(); + +table.updateSchema().addColumn("n", Types.IntegerType.get()).commit(); +ops.refresh(); + +JdbcTableOperations spyOps = spy(ops); + +try (MockedStatic mockedStatic = Mockito.mockStatic(JdbcUtil.class)) { + mockedStatic + .when(() -> JdbcUtil.loadTable(any(), any(), any(), any())) + .thenThrow(new SQLException()); + assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1)) + .isInstanceOf(UncheckedSQLException.class) + .hasMessageStartingWith("Unknown failure"); +} + } + + @Test + public void testCommitExceptionWithMessage() { +TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl"); +BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); +JdbcTableOperations ops = (JdbcTableOperations) table.operations(); + +TableMetadata metadataV1 = ops.current(); + +table.updateSchema().addColumn("n", Types.IntegerType.get()).commit(); +ops.refresh(); + +JdbcTableOperations spyOps = spy(ops); + +try (MockedStatic mockedStatic = Mockito.mockStatic(JdbcUtil.class)) { + mockedStatic + .when(() -> JdbcUtil.loadTable(any(), any(), any(), any())) + .thenThrow(new SQLException("constraint failed")); Review Comment: shouldn't this have a null message? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Partitioned Append on Identity Transform [iceberg-python]
Fokko merged PR #555: URL: https://github.com/apache/iceberg-python/pull/555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
sdd commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553098425 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,363 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn visit(&mut self, predicate: &BoundPredicate) -> Result { Review Comment: Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nk1506 commented on code in PR #10082: URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553083646 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws IOException { Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2); } + @Test + public void testCommitExceptionWithoutMessage() { +TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl"); +BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); +JdbcTableOperations ops = (JdbcTableOperations) table.operations(); +TableMetadata metadataV1 = ops.current(); + +table.updateSchema().addColumn("n", Types.IntegerType.get()).commit(); +ops.refresh(); + +JdbcTableOperations spyOps = spy(ops); + +try (MockedStatic mockedStatic = Mockito.mockStatic(JdbcUtil.class)) { + mockedStatic + .when(() -> JdbcUtil.loadTable(any(), any(), any(), any())) + .thenThrow(new SQLException()); + assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1)) + .isInstanceOf(UncheckedSQLException.class) + .hasMessageStartingWith("Unknown failure"); +} + } + + @Test + public void testCommitExceptionWithMessage() { +TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl"); +BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); +JdbcTableOperations ops = (JdbcTableOperations) table.operations(); + +TableMetadata metadataV1 = ops.current(); + +table.updateSchema().addColumn("n", Types.IntegerType.get()).commit(); +ops.refresh(); + +JdbcTableOperations spyOps = spy(ops); + +try (MockedStatic mockedStatic = Mockito.mockStatic(JdbcUtil.class)) { + mockedStatic + .when(() -> JdbcUtil.loadTable(any(), any(), any(), any())) + .thenThrow(new SQLException("constraint failed")); Review Comment: Null message we are checking at other [testCommitExceptionWithoutMessage](https://github.com/apache/iceberg/pull/10082/files#diff-754dd3358fceeaa7214faa73eee5b8b0e71769f7d9a2b568a17191e42db9fce2R994). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
sdd commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553082179 ## crates/iceberg/src/expr/visitors/bound_predicate_visitor.rs: ## @@ -0,0 +1,363 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use crate::expr::{BoundPredicate, BoundReference, PredicateOperator}; +use crate::spec::Datum; +use crate::Result; +use fnv::FnvHashSet; + +pub trait BoundPredicateVisitor { +type T; + +fn visit(&mut self, predicate: &BoundPredicate) -> Result { Review Comment: Ok, happy to change, will do so -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `BoundPredicateVisitor` trait [iceberg-rust]
sdd commented on code in PR #320: URL: https://github.com/apache/iceberg-rust/pull/320#discussion_r1553077742 ## crates/iceberg/src/expr/predicate.rs: ## @@ -217,6 +249,10 @@ impl Display for SetExpression { /// Unbound predicate expression before binding to a schema. #[derive(Debug, PartialEq)] pub enum Predicate { +/// AlwaysTrue predicate, for example, `TRUE`. +AlwaysTrue, +/// AlwaysFalse predicate, for example, `FALSE`. +AlwaysFalse, Review Comment: It's in here because it is needed in two places in `InclusiveProjection`: - First, when we "unbind", transforming a `BoundPredicate` into an unbound `Predicate`: https://github.com/apache/iceberg-rust/pull/321/files#diff-419289c8023c272de50ee4352320ad217fa7c2802f7f633d26e55ece30426be6R41 - Second, when we build up a `Predicate` by "and"ing together a (possibly empty) list of `Predicate`s, starting with an `AlwaysTrue` to ensure that we return a valid `Predicate` even in the empty list case: https://github.com/apache/iceberg-rust/pull/321/files#diff-419289c8023c272de50ee4352320ad217fa7c2802f7f633d26e55ece30426be6R76 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Hive-View Integration Tests [iceberg]
nk1506 commented on code in PR #10088: URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553075946 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1421,38 +1442,38 @@ public void showViews() throws NoSuchTableException { insertRows(6); String sql = String.format("SELECT * from %s", tableName); sql("CREATE VIEW v1 AS %s", sql); -sql("CREATE VIEW prefixV2 AS %s", sql); -sql("CREATE VIEW prefixV3 AS %s", sql); +sql("CREATE VIEW prefixv2 AS %s", sql); Review Comment: Hive Catalog returns table/view name always in lower case. To be safe changed the name to lowercase only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Hive-View Integration Tests [iceberg]
nk1506 commented on code in PR #10088: URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553073692 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1341,10 +1355,17 @@ public void createViewWithSubqueryExpressionInQueryThatIsRewritten() throws NoSu .containsExactly(row(3), row(3), row(3)); sql("USE spark_catalog"); - assertThatThrownBy(() -> sql(sql)) -.isInstanceOf(AnalysisException.class) -.hasMessageContaining(String.format("The table or view `%s` cannot be found", tableName)); +.satisfiesAnyOf( +throwable -> +assertThat(throwable) +.isInstanceOfAny(AnalysisException.class) +.hasMessageContaining( +String.format("The table or view `%s` cannot be found", tableName)), +throwable -> Review Comment: same -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Hive-View Integration Tests [iceberg]
nk1506 commented on code in PR #10088: URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553073213 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1318,8 +1324,16 @@ public void createViewWithSubqueryExpressionInFilterThatIsRewritten() sql("USE spark_catalog"); assertThatThrownBy(() -> sql(sql)) -.isInstanceOf(AnalysisException.class) -.hasMessageContaining(String.format("The table or view `%s` cannot be found", tableName)); +.satisfiesAnyOf( +throwable -> +assertThat(throwable) Review Comment: Hive throws Exception of type `SparkException` with error message as `Exception thrown in awaitResult`. Should we override these tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Hive-View Integration Tests [iceberg]
nk1506 commented on code in PR #10088: URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553071451 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -900,7 +906,7 @@ public void dropV1View() { String v1View = "v1ViewToBeDropped"; sql("USE spark_catalog"); sql("CREATE NAMESPACE IF NOT EXISTS %s", NAMESPACE); -sql("CREATE TABLE %s (id INT, data STRING)", tableName); +sql("CREATE TABLE IF NOT EXISTS %s (id INT, data STRING)", tableName); Review Comment: Hive is throwing table already found error. Even though we are switching catalog. we can also override this test for Hive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Hive-View Integration Tests [iceberg]
nk1506 commented on code in PR #10088: URL: https://github.com/apache/iceberg/pull/10088#discussion_r1553067799 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -62,6 +64,9 @@ public class TestViews extends ExtensionsTestBase { private static final Namespace NAMESPACE = Namespace.of("default"); private final String tableName = "table"; + @Parameter(index = 3) + protected String location; Review Comment: Different catalog has different scheme+path. Current tests were only for InMemoryCatalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]
nastra commented on code in PR #10082: URL: https://github.com/apache/iceberg/pull/10082#discussion_r1553049382 ## core/src/test/java/org/apache/iceberg/jdbc/TestJdbcCatalog.java: ## @@ -970,6 +976,51 @@ public void testCatalogWithCustomMetricsReporter() throws IOException { Assertions.assertThat(CustomMetricsReporter.COUNTER.get()).isEqualTo(2); } + @Test + public void testCommitExceptionWithoutMessage() { +TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl"); +BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); +JdbcTableOperations ops = (JdbcTableOperations) table.operations(); +TableMetadata metadataV1 = ops.current(); + +table.updateSchema().addColumn("n", Types.IntegerType.get()).commit(); +ops.refresh(); + +JdbcTableOperations spyOps = spy(ops); + +try (MockedStatic mockedStatic = Mockito.mockStatic(JdbcUtil.class)) { + mockedStatic + .when(() -> JdbcUtil.loadTable(any(), any(), any(), any())) + .thenThrow(new SQLException()); + assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1)) + .isInstanceOf(UncheckedSQLException.class) + .hasMessageStartingWith("Unknown failure"); +} + } + + @Test + public void testCommitExceptionWithMessage() { +TableIdentifier tableIdent = TableIdentifier.of("db", "ns1", "ns2", "tbl"); +BaseTable table = (BaseTable) catalog.buildTable(tableIdent, SCHEMA).create(); +JdbcTableOperations ops = (JdbcTableOperations) table.operations(); + +TableMetadata metadataV1 = ops.current(); + +table.updateSchema().addColumn("n", Types.IntegerType.get()).commit(); +ops.refresh(); + +JdbcTableOperations spyOps = spy(ops); + +try (MockedStatic mockedStatic = Mockito.mockStatic(JdbcUtil.class)) { + mockedStatic + .when(() -> JdbcUtil.loadTable(any(), any(), any(), any())) + .thenThrow(new SQLException("constraint failed")); Review Comment: shouldn't this have a null message? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org