Re: [PR] feat: Project transform [iceberg-rust]

2024-04-04 Thread via GitHub


liurenjie1024 commented on PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2039007641

   cc @Fokko Do you have other comments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Project transform [iceberg-rust]

2024-04-04 Thread via GitHub


liurenjie1024 commented on code in PR #309:
URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1548808318


##
crates/iceberg/src/spec/transform.rs:
##
@@ -261,6 +269,323 @@ impl Transform {
 _ => self == other,
 }
 }
+
+/// Projects a given predicate according to the transformation
+/// specified by the `Transform` instance.
+///
+/// This allows predicates to be effectively applied to data
+/// that has undergone transformation, enabling efficient querying
+/// and filtering based on the original, untransformed data.
+///
+/// # Example
+/// Suppose, we have row filter `a = 10`, and a partition spec
+/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition
+/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to
+/// `bs = bucket(10, 37)`
+pub fn project(, name: String, predicate: ) -> 
Result> {
+let func = create_transform_function(self)?;
+
+match self {
+Transform::Identity => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => 
Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literal().to_owned(),
+,
+BoundPredicate::Set(expr) => 
Ok(Some(Predicate::Set(SetExpression::new(
+expr.op(),
+Reference::new(name),
+expr.literals().to_owned(),
+,
+_ => Ok(None),
+},
+Transform::Bucket(_) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => self.project_binary(name, 
expr, ),
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
),
+_ => Ok(None),
+},
+Transform::Truncate(width) => match predicate {
+BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), 
name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
, Some(*width))
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
),
+_ => Ok(None),
+},
+Transform::Year | Transform::Month | Transform::Day | 
Transform::Hour => {
+match predicate {
+BoundPredicate::Unary(expr) => 
Self::project_unary(expr.op(), name),
+BoundPredicate::Binary(expr) => {
+self.project_binary_with_adjusted_boundary(name, expr, 
, None)
+}
+BoundPredicate::Set(expr) => self.project_set(expr, name, 
),
+_ => Ok(None),
+}
+}
+_ => Ok(None),
+}
+}
+
+/// Check if `Transform` is applicable on datum's `PrimitiveType`
+fn can_transform(, datum: ) -> bool {
+let input_type = datum.data_type().clone();
+self.result_type(::Primitive(input_type)).is_ok()
+}
+
+/// Creates a unary predicate from a given operator and a reference name.
+fn project_unary(op: PredicateOperator, name: String) -> 
Result> {
+Ok(Some(Predicate::Unary(UnaryExpression::new(
+op,
+Reference::new(name),
+
+}
+
+/// Attempts to create a binary predicate based on a binary expression,
+/// if applicable.
+///
+/// This method evaluates a given binary expression and, if the operation
+/// is equality (`Eq`) and the literal can be transformed, constructs a
+/// `Predicate::Binary`variant representing the binary operation.
+fn project_binary(
+,
+name: String,
+expr: ,
+func: ,
+) -> Result> {
+if expr.op() != PredicateOperator::Eq || 
!self.can_transform(expr.literal()) {
+return Ok(None);
+}
+
+Ok(Some(Predicate::Binary(BinaryExpression::new(
+expr.op(),
+Reference::new(name),
+func.transform_literal_result(expr.literal())?,
+
+}
+
+/// Projects a binary expression to a predicate with an adjusted boundary.
+///
+/// Checks if the literal within the given binary expression is
+/// transformable. If transformable, it proceeds to potentially adjust
+/// the boundary of the expression based on the comparison operator (`op`).
+/// The potential adjustments involve incrementing or decrementing the
+/// literal value and changing the `PredicateOperator` itself to its
+/// inclusive variant.
+fn project_binary_with_adjusted_boundary(
+,
+name: String,
+expr: ,
+func: ,
+

Re: [I] cannot insert value in hive command shell [iceberg]

2024-04-04 Thread via GitHub


github-actions[bot] commented on issue #2442:
URL: https://github.com/apache/iceberg/issues/2442#issuecomment-2038481456

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can snapshot has an optional name? [iceberg]

2024-04-04 Thread via GitHub


github-actions[bot] commented on issue #2231:
URL: https://github.com/apache/iceberg/issues/2231#issuecomment-2038481247

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Can snapshot has an optional name? [iceberg]

2024-04-04 Thread via GitHub


github-actions[bot] closed issue #2231: Can snapshot has an optional name?
URL: https://github.com/apache/iceberg/issues/2231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] flink 1.12.0 cannot run iceberg batch mode [iceberg]

2024-04-04 Thread via GitHub


github-actions[bot] closed issue #2225: flink 1.12.0   cannot run  iceberg 
batch mode 
URL: https://github.com/apache/iceberg/issues/2225


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] flink 1.12.0 cannot run iceberg batch mode [iceberg]

2024-04-04 Thread via GitHub


github-actions[bot] commented on issue #2225:
URL: https://github.com/apache/iceberg/issues/2225#issuecomment-2038481227

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] What is the meaning of `delete_rows_count` and `delete_data_count_file ` at manifest [iceberg]

2024-04-04 Thread via GitHub


github-actions[bot] commented on issue #2445:
URL: https://github.com/apache/iceberg/issues/2445#issuecomment-2038481477

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `AlwaysTrue` and `AlwaysFalse` to `Predicate` [iceberg-rust]

2024-04-04 Thread via GitHub


sdd commented on PR #319:
URL: https://github.com/apache/iceberg-rust/pull/319#issuecomment-2038369738

   merged this into https://github.com/apache/iceberg-rust/pull/320 as it is a 
bit pointless on its own


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `AlwaysTrue` and `AlwaysFalse` to `Predicate` [iceberg-rust]

2024-04-04 Thread via GitHub


sdd closed pull request #319: Add `AlwaysTrue` and `AlwaysFalse` to `Predicate`
URL: https://github.com/apache/iceberg-rust/pull/319


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-04 Thread via GitHub


sdd commented on PR #317:
URL: https://github.com/apache/iceberg-rust/pull/317#issuecomment-2038350682

   PTAL @liurenjie1024 and @marvinlanhenke - extracted from 
https://github.com/apache/iceberg-rust/pull/241 and added tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support partial overwrites [iceberg-python]

2024-04-04 Thread via GitHub


whynick1 commented on issue #268:
URL: https://github.com/apache/iceberg-python/issues/268#issuecomment-2038339351

   > @whynick1 This should be done once #554 is in. Would you like to provide 
one of the metadata tables? See #511
   
   @Fokko In that case, can I take 
https://github.com/apache/iceberg-python/issues/554, if not I am happy to work 
on #511 too.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add a few getters that are required for the evaluating manifests in TableScan [iceberg-rust]

2024-04-04 Thread via GitHub


sdd closed pull request #318: Add a few getters that are required for the 
evaluating manifests in TableScan
URL: https://github.com/apache/iceberg-rust/pull/318


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] OpenAPI: Fix additionalProperties for SnapshotSummary [iceberg]

2024-04-04 Thread via GitHub


Fokko merged PR #9838:
URL: https://github.com/apache/iceberg/pull/9838


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] OpenAPI: Fix additionalProperties for SnapshotSummary [iceberg]

2024-04-04 Thread via GitHub


Fokko commented on code in PR #9838:
URL: https://github.com/apache/iceberg/pull/9838#discussion_r1552476063


##
open-api/rest-catalog-open-api.py:
##
@@ -171,7 +171,6 @@ class SortOrder(BaseModel):
 
 class Summary(BaseModel):
 operation: Literal['append', 'replace', 'overwrite', 'delete']
-additionalProperties: Optional[str] = None

Review Comment:
   So every generator is different. I don't think this is supported by the one 
we use right now. But I think the change in the yaml is correct  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add a few getters that are required for the evaluating manifests in TableScan [iceberg-rust]

2024-04-04 Thread via GitHub


sdd commented on PR #318:
URL: https://github.com/apache/iceberg-rust/pull/318#issuecomment-2038244514

   @liurenjie1024 and @marvinlanhenke - small PR split out from 
https://github.com/apache/iceberg-rust/pull/241 ready for review :-) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add `AlwaysTrue` and `AlwaysFalse` to `Predicate` [iceberg-rust]

2024-04-04 Thread via GitHub


sdd commented on PR #319:
URL: https://github.com/apache/iceberg-rust/pull/319#issuecomment-2038240842

   @liurenjie1024 and @marvinlanhenke  - small PR split out from 
https://github.com/apache/iceberg-rust/pull/241 ready for review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-04 Thread via GitHub


sdd opened a new pull request, #323:
URL: https://github.com/apache/iceberg-rust/pull/323

   This PR was broken out of https://github.com/apache/iceberg-rust/pull/241 as 
that PR was getting too large.
   
   It depends on https://github.com/apache/iceberg-rust/pull/322, and 
integrates the `ManifestEvaluator` from that PR into `TableScan`, to perform 
the filtering capability provided by `ManfestEvaluator` on the manifest files 
in the manifest for the scan's snapshot, rejecting any manifests that don't 
match the table scan's filter predicate, if present.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add `ManifestEvaluator`, used to filter manifests in table scans [iceberg-rust]

2024-04-04 Thread via GitHub


sdd opened a new pull request, #322:
URL: https://github.com/apache/iceberg-rust/pull/322

   This PR has been broken out of 
https://github.com/apache/iceberg-rust/pull/241 as that PR was getting too 
large.
   
   It depends on https://github.com/apache/iceberg-rust/pull/320 and 
https://github.com/apache/iceberg-rust/pull/321.
   
   It introduces `ManfestEvaluator`, which is used to apply the filter 
predicate from a table scan to a `ManifestFile` to see if it should be filtered 
out of the scan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump junit from 5.10.1 to 5.10.2 [iceberg]

2024-04-04 Thread via GitHub


Fokko commented on PR #9699:
URL: https://github.com/apache/iceberg/pull/9699#issuecomment-2038140183

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump calcite from 1.10.0 to 1.35.0 [iceberg]

2024-04-04 Thread via GitHub


dependabot[bot] commented on PR #8239:
URL: https://github.com/apache/iceberg/pull/8239#issuecomment-2038141263

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. You can also ignore all major, minor, or patch 
releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump calcite from 1.10.0 to 1.35.0 [iceberg]

2024-04-04 Thread via GitHub


Fokko commented on PR #8239:
URL: https://github.com/apache/iceberg/pull/8239#issuecomment-2038141211

   Duplicate of https://github.com/apache/iceberg/pull/9042


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump calcite from 1.10.0 to 1.35.0 [iceberg]

2024-04-04 Thread via GitHub


Fokko closed pull request #8239: Build: Bump calcite from 1.10.0 to 1.35.0
URL: https://github.com/apache/iceberg/pull/8239


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump org.testcontainers:testcontainers from 1.19.5 to 1.19.7 [iceberg]

2024-04-04 Thread via GitHub


Fokko merged PR #9912:
URL: https://github.com/apache/iceberg/pull/9912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] add `InclusiveProjection` Visitor [iceberg-rust]

2024-04-04 Thread via GitHub


sdd opened a new pull request, #321:
URL: https://github.com/apache/iceberg-rust/pull/321

   This PR has been broken out of 
https://github.com/apache/iceberg-rust/pull/241 as it was getting too large.
   
   The `InclusiveProjection` is used in the process of filtering manifest files 
in table scans. It projects the `BoundPredicate` that is provided as the filter 
in a table scan into a `Predicate` that can be used for filtering the partition 
specs of manifests.
   
   This PR depends on https://github.com/apache/iceberg-rust/pull/319 and 
https://github.com/apache/iceberg-rust/pull/319 and 
https://github.com/apache/iceberg-rust/pull/320.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add `BoundPredicateEvaluator` trait [iceberg-rust]

2024-04-04 Thread via GitHub


sdd opened a new pull request, #320:
URL: https://github.com/apache/iceberg-rust/pull/320

   This trait is used for `BoundPredicate` visitors that evaluate the predicate 
to a boolean, and associated tests. It has been broken out of the 
https://github.com/apache/iceberg-rust/pull/241 PR as that was getting too 
large.
   
   This depends on https://github.com/apache/iceberg-rust/pull/319
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add a few getters that are required for the evaluating manifests in TableScan [iceberg-rust]

2024-04-04 Thread via GitHub


sdd opened a new pull request, #318:
URL: https://github.com/apache/iceberg-rust/pull/318

   This small PR has been broken out of 
https://github.com/apache/iceberg-rust/pull/241 as that PR was getting too 
large.
   
   It should be pretty uncontroversial, adding some getters that are required 
in the manifest evaluation process.
   
   Additionally the `chrono` dep in Cargo.toml was changed from being pinned to 
`0.4.34` to being `~0.4.34` to permit compatible updates as they arise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Fix default FileIO#newInputFile ManifestFile, DataFile and DeleteFile implementation to pass lengths [iceberg]

2024-04-04 Thread via GitHub


amogh-jahagirdar merged PR #9953:
URL: https://github.com/apache/iceberg/pull/9953


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Fix default FileIO#newInputFile ManifestFile, DataFile and DeleteFile implementation to pass lengths [iceberg]

2024-04-04 Thread via GitHub


amogh-jahagirdar commented on PR #9953:
URL: https://github.com/apache/iceberg/pull/9953#issuecomment-2038104819

   Thanks for the reviews @ajantha-bhat and @Fokko ! Merging
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Add Struct Accessors to BoundReferences [iceberg-rust]

2024-04-04 Thread via GitHub


sdd opened a new pull request, #317:
URL: https://github.com/apache/iceberg-rust/pull/317

   First PR to come out of breaking up 
https://github.com/apache/iceberg-rust/pull/241.
   
   Adds `StructAccessor`, which is added to `BoundReference` as a means of 
retrieving a field's value from a (possibly nested) Struct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] REST: fix spurious warning when shutting down refresh executor [iceberg]

2024-04-04 Thread via GitHub


amogh-jahagirdar commented on PR #10087:
URL: https://github.com/apache/iceberg/pull/10087#issuecomment-2038064291

   Thanks for fixing this @adutra! Merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] REST: fix spurious warning when shutting down refresh executor [iceberg]

2024-04-04 Thread via GitHub


amogh-jahagirdar merged PR #10087:
URL: https://github.com/apache/iceberg/pull/10087


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Partitioned Append on Identity Transform [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on code in PR #555:
URL: https://github.com/apache/iceberg-python/pull/555#discussion_r1552302617


##
pyiceberg/manifest.py:
##
@@ -283,31 +277,12 @@ def __repr__(self) -> str:
 }
 
 
-@singledispatch
-def partition_field_to_data_file_partition_field(partition_field_type: 
IcebergType) -> PrimitiveType:
-raise TypeError(f"Unsupported partition field type: 
{partition_field_type}")
-
-
-@partition_field_to_data_file_partition_field.register(LongType)
-@partition_field_to_data_file_partition_field.register(DateType)
-@partition_field_to_data_file_partition_field.register(TimeType)
-@partition_field_to_data_file_partition_field.register(TimestampType)
-@partition_field_to_data_file_partition_field.register(TimestamptzType)
-def _(partition_field_type: PrimitiveType) -> IntegerType:
-return IntegerType()
-
-
-@partition_field_to_data_file_partition_field.register(PrimitiveType)
-def _(partition_field_type: PrimitiveType) -> PrimitiveType:
-return partition_field_type
-
-
-def data_file_with_partition(partition_type: StructType, format_version: 
TableVersion) -> StructType:
+def data_file_with_partition(partition_type: StructType, format_version: 
Literal[1, 2]) -> StructType:

Review Comment:
   Nit:
   ```suggestion
   def data_file_with_partition(partition_type: StructType, format_version: 
TableVersion) -> StructType:
   ```



##
pyiceberg/manifest.py:
##
@@ -289,10 +286,7 @@ def 
partition_field_to_data_file_partition_field(partition_field_type: IcebergTy
 
 
 @partition_field_to_data_file_partition_field.register(LongType)
-@partition_field_to_data_file_partition_field.register(DateType)

Review Comment:
   Beautiful, thanks  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] API: Fix default FileIO#newInputFile ManifestFile, DataFile and DeleteFile implementation to pass lengths [iceberg]

2024-04-04 Thread via GitHub


amogh-jahagirdar commented on code in PR #9953:
URL: https://github.com/apache/iceberg/pull/9953#discussion_r1552280909


##
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##
@@ -377,6 +384,50 @@ public void testResolvingFileIOLoad() {
 Assertions.assertThat(result).isInstanceOf(S3FileIO.class);
   }
 
+  @Test
+  public void testInputFileWithDataFile() throws IOException {
+String location = "s3://bucket/path/to/data-file.parquet";
+DataFile dataFile =
+DataFiles.builder(PartitionSpec.unpartitioned())
+.withPath(location)
+.withFileSizeInBytes(123L)
+.withFormat(FileFormat.PARQUET)
+.withRecordCount(123L)
+.build();
+OutputStream outputStream = s3FileIO.newOutputFile(location).create();
+byte[] data = "testing".getBytes();
+outputStream.write(data);
+outputStream.close();
+
+InputFile inputFile = s3FileIO.newInputFile(dataFile);
+Assertions.assertThat(inputFile.getLength())
+.as("Data file length should be determined from the file size stats")
+.isEqualTo(123L);
+  }
+
+  @Test
+  public void testInputFileWithManifest() throws IOException {
+String dataFileLocation = "s3://bucket/path/to/data-file-2.parquet";
+DataFile dataFile =
+DataFiles.builder(PartitionSpec.unpartitioned())
+.withPath(dataFileLocation)
+.withFileSizeInBytes(123L)
+.withFormat(FileFormat.PARQUET)
+.withRecordCount(123L)
+.build();
+String manifestLocation = "s3://bucket/path/to/manifest.avro";
+OutputFile outputFile = s3FileIO.newOutputFile(manifestLocation);
+ManifestWriter writer =
+ManifestFiles.write(PartitionSpec.unpartitioned(), outputFile);
+writer.add(dataFile);
+writer.close();
+ManifestFile manifest = writer.toManifestFile();
+InputFile inputFile = s3FileIO.newInputFile(manifest);
+Assertions.assertThat(inputFile.getLength())
+.as("Manifest file length should be determined from the file size 
stats")
+.isEqualTo(manifest.length());

Review Comment:
   I just added a verification that the s3Mock.headObject is never called when 
determining the length. It fails before this change, and passes after the fix 
so I think it's a better test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add entries metadata table [iceberg-python]

2024-04-04 Thread via GitHub


Fokko merged PR #551:
URL: https://github.com/apache/iceberg-python/pull/551


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add entries metadata table [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on PR #551:
URL: https://github.com/apache/iceberg-python/pull/551#issuecomment-2037933309

   Good one @HonahX, just added it  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Add samples for listing table.files and table.partitions [iceberg-python]

2024-04-04 Thread via GitHub


moryachok commented on issue #578:
URL: https://github.com/apache/iceberg-python/issues/578#issuecomment-2037890575

   Hey @Fokko,
   that would be awesome, let me know when it merged and I will test it on my 
side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support partial overwrites [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on issue #268:
URL: https://github.com/apache/iceberg-python/issues/268#issuecomment-2037867614

   Closing this since it is a duplicate of #511. A partial overwrite is 
essentially a delete + append.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support partial overwrites [iceberg-python]

2024-04-04 Thread via GitHub


Fokko closed issue #268: Support partial overwrites
URL: https://github.com/apache/iceberg-python/issues/268


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support partial overwrites [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on issue #268:
URL: https://github.com/apache/iceberg-python/issues/268#issuecomment-2037866632

   @whynick1 This should be done once 
https://github.com/apache/iceberg-python/issues/554 is in. Would you like to 
provide one of the metadata tables? See 
https://github.com/apache/iceberg-python/issues/511


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Add samples for listing table.files and table.partitions [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on issue #578:
URL: https://github.com/apache/iceberg-python/issues/578#issuecomment-2037863395

   @moryachok It looks like someone else is already working on files, how about 
partitions? I'll merge https://github.com/apache/iceberg-python/pull/551 later 
today and then it should be relative straightforward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Parquet: Make row-group filters cooperate to filter [iceberg]

2024-04-04 Thread via GitHub


cccs-jc commented on PR #6893:
URL: https://github.com/apache/iceberg/pull/6893#issuecomment-2037778990

   It would be great to revive your PR. I think it's the best approach and it's 
a major improvement over the current implementation. The query speed is much 
faster with this fix.
   
   I have update the test case in my own branch, feel free to consult if you 
need to. 
https://github.com/CybercentreCanada/iceberg/tree/iceberg-improve_parq_row_group_filter
   
   thanks @zhongyujiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Add gc.enabled property [iceberg]

2024-04-04 Thread via GitHub


nk1506 closed pull request #9231: Docs: Add gc.enabled property
URL: https://github.com/apache/iceberg/pull/9231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Add gc.enabled property [iceberg]

2024-04-04 Thread via GitHub


nk1506 commented on PR #9231:
URL: https://github.com/apache/iceberg/pull/9231#issuecomment-2037765782

   Closing this as this might nor required on doc. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Common metadata for TableMetadata and ViewMetadata [iceberg]

2024-04-04 Thread via GitHub


nk1506 commented on PR #9682:
URL: https://github.com/apache/iceberg/pull/9682#issuecomment-2037762988

   Common code abstraction on Hive-Table/View has been parked for later. 
BaseMetadata for Table and View might not be required. We can fulfil the same 
using different strategy. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Common metadata for TableMetadata and ViewMetadata [iceberg]

2024-04-04 Thread via GitHub


nk1506 closed pull request #9682: Core: Common metadata for TableMetadata and 
ViewMetadata
URL: https://github.com/apache/iceberg/pull/9682


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support partial overwrites [iceberg-python]

2024-04-04 Thread via GitHub


whynick1 commented on issue #268:
URL: https://github.com/apache/iceberg-python/issues/268#issuecomment-2037755644

   @Fokko if nobody is already looking at this, would love to try and take a 
stab at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-04 Thread via GitHub


nk1506 closed pull request #8907: Hive: Add View support for HIVE catalog
URL: https://github.com/apache/iceberg/pull/8907


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-04 Thread via GitHub


nk1506 commented on PR #8907:
URL: https://github.com/apache/iceberg/pull/8907#issuecomment-2037749273

   Closing this PR as same has been addressed with ongoing 
[PR](https://github.com/apache/iceberg/pull/9852) . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Docs: Document support for binary in truncate transform [iceberg]

2024-04-04 Thread via GitHub


TheNeuralBit commented on code in PR #10079:
URL: https://github.com/apache/iceberg/pull/10079#discussion_r1552085818


##
format/spec.md:
##
@@ -245,19 +245,19 @@ For example, a file may be written with schema `1: a int, 
2: b string, 3: c doub
 
 Tables may also define a property `schema.name-mapping.default` with a JSON 
name mapping containing a list of field mapping objects. These mappings provide 
fallback field ids to be used when a data file does not contain field id 
information. Each object should contain
 
-* `names`: A required list of 0 or more names for a field. 
+* `names`: A required list of 0 or more names for a field.
 * `field-id`: An optional Iceberg field ID used when a field's name is present 
in `names`
 * `fields`: An optional list of field mappings for child field of structs, 
maps, and lists.
 
 Field mapping fields are constrained by the following rules:
 
-* A name may contain `.` but this refers to a literal name, not a nested 
field. For example, `a.b` refers to a field named `a.b`, not child field `b` of 
field `a`. 
-* Each child field should be defined with their own field mapping under 
`fields`. 

Review Comment:
   Done



##
format/spec.md:
##
@@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition 
spec contains an equ
 
|---|--|---|-|
 | **`identity`**| Source value, unmodified 
| Any   
| Source type |
 | **`bucket[N]`**   | Hash of value, mod `N` (see below)   
| `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, 
`timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int`   
|
-| **`truncate[W]`** | Value truncated to width `W` (see below) 
| `int`, `long`, `decimal`, `string`
| Source type |
+| **`truncate[W]`** | Value truncated to width `W` (see below) 
| `int`, `long`, `decimal`, `string`, `binary`  
| Source type |

Review Comment:
   Done! Please check my language and example



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Remove `record_fields` from the `Record` class [iceberg-python]

2024-04-04 Thread via GitHub


Fokko opened a new pull request, #580:
URL: https://github.com/apache/iceberg-python/pull/580

   First step towards https://github.com/apache/iceberg-python/issues/579


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Refactor the `Record` by removing the schema [iceberg-python]

2024-04-04 Thread via GitHub


Fokko opened a new issue, #579:
URL: https://github.com/apache/iceberg-python/issues/579

   ### Feature Request / Improvement
   
   The `Record` class in Typedef should not carry a schema as it does now. This 
is mostly for testing purposes, but it is very tempting to rely on this schema, 
while that shouldn't be the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Fix `rewrite_position_delete_files` result file set [iceberg]

2024-04-04 Thread via GitHub


bk-mz commented on PR #9945:
URL: https://github.com/apache/iceberg/pull/9945#issuecomment-2036840825

   @nastra hey, yes, you are correct. sorry for that.
   
   unfortunately I can't setup proper infrastructure to ensure this test 
requirements. 
   
   I've found this bug in prod during production under high load which also 
involves setting custom spark.sql.partitions and this PR fixes that.
   
   can you help with setting up the tests? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-04 Thread via GitHub


nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1551437288


##
core/src/main/java/org/apache/iceberg/jdbc/JdbcTableOperations.java:
##
@@ -138,7 +138,7 @@ public void doCommit(TableMetadata base, TableMetadata 
metadata) {
   throw new UncheckedSQLException(e, "Database warning");
 } catch (SQLException e) {
   // SQLite doesn't set SQLState or throw 
SQLIntegrityConstraintViolationException
-  if (e.getMessage().contains("constraint failed")) {
+  if (e.getMessage() != null && e.getMessage().contains("constraint 
failed")) {

Review Comment:
   please add tests that exercise each change here (similar to the test in 
https://github.com/apache/iceberg/pull/10069)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-04 Thread via GitHub


nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1551398196


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java:
##
@@ -59,29 +59,19 @@ public HiveClientPool(int poolSize, Configuration conf) {
   @Override
   protected IMetaStoreClient newClient() {
 try {
-  try {
-return GET_CLIENT.invoke(
-hiveConf, (HiveMetaHookLoader) tbl -> null, 
HiveMetaStoreClient.class.getName());
-  } catch (RuntimeException e) {
-// any MetaException would be wrapped into RuntimeException during 
reflection, so let's
-// double-check type here
-if (e.getCause() instanceof MetaException) {

Review Comment:
   yes, please revert. The scope within the PR should reflect what's being 
proposed in the commit msg



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-04 Thread via GitHub


nk1506 commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1551350882


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java:
##
@@ -59,29 +59,19 @@ public HiveClientPool(int poolSize, Configuration conf) {
   @Override
   protected IMetaStoreClient newClient() {
 try {
-  try {
-return GET_CLIENT.invoke(
-hiveConf, (HiveMetaHookLoader) tbl -> null, 
HiveMetaStoreClient.class.getName());
-  } catch (RuntimeException e) {
-// any MetaException would be wrapped into RuntimeException during 
reflection, so let's
-// double-check type here
-if (e.getCause() instanceof MetaException) {

Review Comment:
   the other try-catch block looks redundant . We were wrapping into 
[MetaException](https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java#L69)
 and throwing and again 
[wrapping](https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java#L74)
 into RuntimeMetaException . 
   
   Should I revert and add only null check ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-04-04 Thread via GitHub


carlosescura commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2036615576

   @rahil-c is there any possibility to continue working on this PR? Many of us 
would really appreciate it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Hive, JDBC: Add null check before matching the error message [iceberg]

2024-04-04 Thread via GitHub


nastra commented on code in PR #10082:
URL: https://github.com/apache/iceberg/pull/10082#discussion_r1551269821


##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java:
##
@@ -59,29 +59,19 @@ public HiveClientPool(int poolSize, Configuration conf) {
   @Override
   protected IMetaStoreClient newClient() {
 try {
-  try {
-return GET_CLIENT.invoke(
-hiveConf, (HiveMetaHookLoader) tbl -> null, 
HiveMetaStoreClient.class.getName());
-  } catch (RuntimeException e) {
-// any MetaException would be wrapped into RuntimeException during 
reflection, so let's
-// double-check type here
-if (e.getCause() instanceof MetaException) {

Review Comment:
   @nk1506 this isn't fixed. The logic in the try-catch blocks changed. The 
only thing that should be changing is adding a null check before calling 
`t.getMessage()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Add samples for listing table.files and table.partitions [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on issue #578:
URL: https://github.com/apache/iceberg-python/issues/578#issuecomment-2036498291

   Hey @moryachok Thanks for raising this. This is currently being worked on in 
https://github.com/apache/iceberg-python/issues/511 
   
   The files one should be rather straightforward since that's also supported 
through the CLI: `pyiceberg files database.table`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Move writes to Transaction [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on PR #571:
URL: https://github.com/apache/iceberg-python/pull/571#issuecomment-2036374729

   Thanks for working on this @syun64 and @HonahX for the review  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Move writes to Transaction [iceberg-python]

2024-04-04 Thread via GitHub


Fokko merged PR #571:
URL: https://github.com/apache/iceberg-python/pull/571


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Move writes to Transaction [iceberg-python]

2024-04-04 Thread via GitHub


Fokko commented on PR #571:
URL: https://github.com/apache/iceberg-python/pull/571#issuecomment-2036373338

   The methods on the table were added as shorthands indeed. I'm open to 
removing those methods and letting everything go through a transaction. This 
way we have a single way to do things, and it also emphasizes that it is a 
transaction and done in an atomic way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Inconsistent initial field ids with REST catalog [iceberg]

2024-04-04 Thread via GitHub


nastra closed issue #10084: Inconsistent initial field ids with REST catalog
URL: https://github.com/apache/iceberg/issues/10084


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org