Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-19 Thread via GitHub


marvinlanhenke commented on PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#issuecomment-2066180837

   > This looks great!
   > 
   > I think @marvinlanhenke has a valid concern on when to apply the 
rewrite-not. Let's defer that discussion when we start wiring everything 
together. This look good, thanks @sdd for working on this and thanks 
@marvinlanhenke for the review 
   
   Thanks @sdd @Fokko ... I think its reasonable to defer the discussion, once 
we tie everything together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-19 Thread via GitHub


Fokko merged PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-19 Thread via GitHub


sdd commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1571889478


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {

Review Comment:
   If you do it that way, you invoke `rewrite-not` every time that you call 
`project`, which could be several times per invocation of `plan_files` in a 
table scan, and `plan_files` itself can get invoked any number of times on a 
particular instance of a `TableScan`. In the proposed design `rewrite-not` only 
ever gets called once, when the `TableScan` gets built, guaranteeing that the 
predicate is not-free already by the time it gets to the `InclusiveProjection`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-18 Thread via GitHub


sdd commented on PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#issuecomment-2065786584

   I agree that rewrite-not should get applied. But it is not the 
responsibility of the `InclusiveProjection` itself. In this design, it happens 
already earlier on on the process, at the point where the `TableScan` gets 
built.
   
   See here, in my other open PR: 
https://github.com/apache/iceberg-rust/pull/323/files#diff-bbfbe5e334be6c501ba2ca0ddd84d658ff0f3f84f2b5b532212f4e096a09e09bR76


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-18 Thread via GitHub


marvinlanhenke commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1570491035


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {

Review Comment:
   > As per comment below, `rewrite_not` should have already been applied
   
   Maybe, I'm wrong here - but from my understanding looking at the python impl 
the flow is:
   
   - create [manifest 
evaluator](https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1631-L1641)
   - for that we need to create partition_filters -> thus `inclusive_projection`
   - class `InclusiveProjection` extends `ProjectionEvaluator` which defines 
the method `project`
   - in the method `project` we [rewrite-not, bind, and then evaluate 
(visit)](https://github.com/apache/iceberg-python/blob/main/pyiceberg/expressions/visitors.py#L804-L810)
   
   so shouldn't `project` in our `InclusiveProjection` accept a `Predicate` as 
a param and then before calling `visit` apply rewrite-not and bind?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-18 Thread via GitHub


sdd commented on PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#issuecomment-2063172829

   FAO @Fokko  @marvinlanhenke @liurenjie1024: Comments addressed, ready for 
re-review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-18 Thread via GitHub


sdd commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1570118932


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {

Review Comment:
   As per comment below, `rewrite_not` should have already been applied by the 
time we attempt an `InclusiveProjection`. I've updated to `panic` if a Not 
operator is encountered during projection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-18 Thread via GitHub


sdd commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1570116819


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {
+visit(self, predicate)
+}
+
+fn get_parts(
+ self,
+reference: ,
+predicate: ,
+) -> Result {
+let field_id = reference.field().id;
+
+// This could be made a bit neater if `try_reduce` ever becomes stable
+self.get_parts_for_field_id(field_id)
+.iter()
+.try_fold(Predicate::AlwaysTrue, |res, part| {
+Ok(
+if let Some(pred_for_part) = 
part.transform.project(, predicate)? {
+if res == Predicate::AlwaysTrue {
+pred_for_part
+} else {
+res.and(pred_for_part)
+}
+} else {
+res
+},
+)
+})
+}
+}
+
+impl BoundPredicateVisitor for InclusiveProjection {
+type T = Predicate;
+
+fn always_true( self) -> crate::Result {
+Ok(Predicate::AlwaysTrue)
+}
+
+fn always_false( self) -> crate::Result {
+Ok(Predicate::AlwaysFalse)
+}
+
+fn and( self, lhs: Self::T, rhs: Self::T) -> crate::Result {
+Ok(lhs.and(rhs))
+}
+
+fn or( self, lhs: Self::T, rhs: Self::T) -> crate::Result {
+Ok(lhs.or(rhs))
+}
+
+fn not( self, inner: Self::T) -> crate::Result {
+Ok(inner.not())

Review Comment:
   Updated to `panic` here with an informative message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-18 Thread via GitHub


sdd commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1570115947


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {
+visit(self, predicate)
+}
+
+fn get_parts(
+ self,
+reference: ,
+predicate: ,
+) -> Result {
+let field_id = reference.field().id;
+
+// This could be made a bit neater if `try_reduce` ever becomes stable
+self.get_parts_for_field_id(field_id)
+.iter()
+.try_fold(Predicate::AlwaysTrue, |res, part| {
+Ok(
+if let Some(pred_for_part) = 
part.transform.project(, predicate)? {
+if res == Predicate::AlwaysTrue {
+pred_for_part
+} else {
+res.and(pred_for_part)
+}
+} else {
+res
+},
+)
+})
+}
+}
+
+impl BoundPredicateVisitor for InclusiveProjection {
+type T = Predicate;
+
+fn always_true( self) -> crate::Result {
+Ok(Predicate::AlwaysTrue)
+}
+
+fn always_false( self) -> crate::Result {
+Ok(Predicate::AlwaysFalse)
+}
+
+fn and( self, lhs: Self::T, rhs: Self::T) -> crate::Result {
+Ok(lhs.and(rhs))
+}
+
+fn or( self, lhs: Self::T, rhs: Self::T) -> crate::Result {
+Ok(lhs.or(rhs))
+}
+
+fn not( self, inner: Self::T) -> crate::Result {
+Ok(inner.not())
+}
+
+fn is_null(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn not_null(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn is_nan(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn not_nan(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn less_than(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn less_than_or_eq(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn greater_than(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn greater_than_or_eq(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn 

Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-17 Thread via GitHub


Fokko commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1568751884


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {

Review Comment:
   > However, this implies that the InclusiveProjection operates on Predicate 
rather than BoundPredicate. This means we have to not only apply rewrite_not 
but also bind before we call visit here.
   
   InclusiveProjection runs on a BoundPredicate. So the order is: rewrite-not, 
bind, and then evaluate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-17 Thread via GitHub


Fokko commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1568763855


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {
+visit(self, predicate)
+}
+
+fn get_parts(
+ self,
+reference: ,
+predicate: ,
+) -> Result {
+let field_id = reference.field().id;
+
+// This could be made a bit neater if `try_reduce` ever becomes stable
+self.get_parts_for_field_id(field_id)
+.iter()
+.try_fold(Predicate::AlwaysTrue, |res, part| {
+Ok(
+if let Some(pred_for_part) = 
part.transform.project(, predicate)? {
+if res == Predicate::AlwaysTrue {
+pred_for_part
+} else {
+res.and(pred_for_part)
+}
+} else {
+res
+},
+)
+})
+}
+}
+
+impl BoundPredicateVisitor for InclusiveProjection {
+type T = Predicate;
+
+fn always_true( self) -> crate::Result {
+Ok(Predicate::AlwaysTrue)
+}
+
+fn always_false( self) -> crate::Result {
+Ok(Predicate::AlwaysFalse)
+}
+
+fn and( self, lhs: Self::T, rhs: Self::T) -> crate::Result {
+Ok(lhs.and(rhs))
+}
+
+fn or( self, lhs: Self::T, rhs: Self::T) -> crate::Result {
+Ok(lhs.or(rhs))
+}
+
+fn not( self, inner: Self::T) -> crate::Result {
+Ok(inner.not())
+}
+
+fn is_null(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn not_null(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn is_nan(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn not_nan(
+ self,
+reference: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn less_than(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn less_than_or_eq(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn greater_than(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+fn greater_than_or_eq(
+ self,
+reference: ,
+_literal: ,
+predicate: ,
+) -> crate::Result {
+self.get_parts(reference, predicate)
+}
+
+

Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-17 Thread via GitHub


marvinlanhenke commented on code in PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#discussion_r1568705676


##
crates/iceberg/src/expr/visitors/inclusive_projection.rs:
##
@@ -0,0 +1,371 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expr::visitors::bound_predicate_visitor::{visit, 
BoundPredicateVisitor};
+use crate::expr::{BoundPredicate, BoundReference, Predicate};
+use crate::spec::{Datum, PartitionField, PartitionSpecRef};
+use crate::Error;
+use fnv::FnvHashSet;
+use std::collections::HashMap;
+use std::ops::Not;
+
+pub(crate) struct InclusiveProjection {
+partition_spec: PartitionSpecRef,
+cached_parts: HashMap>,
+}
+
+impl InclusiveProjection {
+pub(crate) fn new(partition_spec: PartitionSpecRef) -> Self {
+Self {
+partition_spec,
+cached_parts: HashMap::new(),
+}
+}
+
+fn get_parts_for_field_id( self, field_id: i32) -> 
 {
+if let std::collections::hash_map::Entry::Vacant(e) = 
self.cached_parts.entry(field_id) {
+let mut parts: Vec = vec![];
+for partition_spec_field in _spec.fields {
+if partition_spec_field.source_id == field_id {
+parts.push(partition_spec_field.clone())
+}
+}
+
+e.insert(parts);
+}
+
+_parts[_id]
+}
+
+pub(crate) fn project( self, predicate: ) -> 
crate::Result {

Review Comment:
   I've taken a closer look and I think we're missing a thing here?
   
   I believe we have to rewrite the Predicate with `rewrite_not`,
   since it's assumed the ExprTree does not contain any NOT nodes.
   
https://github.com/apache/iceberg-python/blob/main/pyiceberg/expressions/visitors.py#L805-L810
   
   However, this implies that the `InclusiveProjection` operates on `Predicate` 
rather than `BoundPredicate`. This means we have to not only apply 
`rewrite_not` but also `bind` before we call `visit` here.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-16 Thread via GitHub


sdd commented on PR #335:
URL: https://github.com/apache/iceberg-rust/pull/335#issuecomment-2059682467

   @liurenjie1024 @Fokko @marvinlanhenke @ZENOTME PTAL, I've refactored this on 
top of the `BoundPredicateVisitor` that was merged a few hours ago and it is 
ready for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-16 Thread via GitHub


sdd commented on PR #321:
URL: https://github.com/apache/iceberg-rust/pull/321#issuecomment-2059636629

   Closing this one in favour of the alternate PR based on the 
BoundPredicateVisitor design that got merged 
(https://github.com/apache/iceberg-rust/pull/335)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-16 Thread via GitHub


sdd closed pull request #321: add `InclusiveProjection` Visitor
URL: https://github.com/apache/iceberg-rust/pull/321


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-07 Thread via GitHub


sdd commented on PR #321:
URL: https://github.com/apache/iceberg-rust/pull/321#issuecomment-2041438110

   Updated to align with latest `BoundPredicateVisitor` iteration in 
https://github.com/apache/iceberg-rust/pull/320
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org