Re: [PR] fix: null character not permitted in chr function [datafusion-comet]

2024-06-09 Thread via GitHub
viirya commented on code in PR #513: URL: https://github.com/apache/datafusion-comet/pull/513#discussion_r1632203352 ## core/src/execution/datafusion/expressions/scalar_funcs/chr.rs: ## @@ -0,0 +1,119 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] fix: null character not permitted in chr function [datafusion-comet]

2024-06-09 Thread via GitHub
viirya commented on code in PR #513: URL: https://github.com/apache/datafusion-comet/pull/513#discussion_r1632203589 ## core/src/execution/datafusion/expressions/scalar_funcs/chr.rs: ## @@ -0,0 +1,119 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156415596 @alamb ...while prototyping I ran into an "issue" or some questions, perhaps you can help here. Since we could have more than one page per column per row-group,

[PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
goldmedal opened a new pull request, #10840: URL: https://github.com/apache/datafusion/pull/10840 ## Which issue does this PR close? Closes #10838 . ## Rationale for this change ## What changes are included in this PR? - Convert `approx_median` to UDAF

Re: [PR] fix: null character not permitted in chr function [datafusion-comet]

2024-06-09 Thread via GitHub
vaibhawvipul commented on code in PR #513: URL: https://github.com/apache/datafusion-comet/pull/513#discussion_r1632221893 ## core/src/execution/datafusion/expressions/scalar_funcs/chr.rs: ## @@ -0,0 +1,119 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [I] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
goldmedal commented on issue #10838: URL: https://github.com/apache/datafusion/issues/10838#issuecomment-2156419089 Hi @jayzhan211 I tried to convert `ApproxPercentileCont`, but I encountered some challenges. `ApproxPercentileCont` creates the accumulator based on the `percentile` parame

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632225209 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632225209 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632227588 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156435325 @alamb ...just to confirm my current understanding. Since we can have more than one page per column per row group, I can get multiple statistics. Or put differe

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632227588 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] fix: Support double quotes in `date_part` [datafusion]

2024-06-09 Thread via GitHub
Jefffrey commented on code in PR #10833: URL: https://github.com/apache/datafusion/pull/10833#discussion_r1632234019 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -127,7 +127,12 @@ impl ScalarUDFImpl for DatePartFunc { ColumnarValue::Scalar(scalar) => sca

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632227588 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632234343 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632225209 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632237470 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632237470 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632238045 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

Re: [PR] fix: Support double quotes in `date_part` [datafusion]

2024-06-09 Thread via GitHub
Weijun-H commented on code in PR #10833: URL: https://github.com/apache/datafusion/pull/10833#discussion_r1632239784 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -127,7 +127,12 @@ impl ScalarUDFImpl for DatePartFunc { ColumnarValue::Scalar(scalar) => sca

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632243037 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [I] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on issue #10838: URL: https://github.com/apache/datafusion/issues/10838#issuecomment-2156459652 > Hi @jayzhan211 I tried to convert `ApproxPercentileCont`, but I encountered some challenges. `ApproxPercentileCont` creates the accumulator based on the `percentile` parame

Re: [I] Feedback request for providing configurable UDF functions [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10744: URL: https://github.com/apache/datafusion/issues/10744#issuecomment-2156497086 We just merged the aggregate builder in https://github.com/apache/datafusion/pull/10560 -- I am quite happy with how it turned out, in case you want to take a friendly look --

Re: [PR] Stop copying LogicalPlan and Exprs in `CommonSubexprEliminate` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10835: URL: https://github.com/apache/datafusion/pull/10835#discussion_r1632235419 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -184,90 +189,138 @@ impl CommonSubexprEliminate { Ok((rewrite_exprs, new_input)) } -

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632287690 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -801,15 +814,19 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { let expr_

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632287838 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -962,6 +980,26 @@ mod test { Ok(()) } +#[test] Review Comment: Thanks for

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632289500 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing Alias

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632290625 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,9 +786,11 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(Tr

[PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
andygrove opened a new pull request, #10841: URL: https://github.com/apache/datafusion/pull/10841 ## Which issue does this PR close? N/A ## Rationale for this change - Improve the quality of generated changelogs - Reduce manual effort required when gene

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632295422 ## datafusion/sqllogictest/test_files/tpch/q1.slt.part: ## @@ -41,8 +41,8 @@ explain select logical_plan 01)Sort: lineitem.l_returnflag ASC NULLS LAST,

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632296297 ## datafusion/sqllogictest/test_files/tpch/q1.slt.part: ## @@ -41,8 +41,8 @@ explain select logical_plan 01)Sort: lineitem.l_returnflag ASC NULLS L

Re: [I] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
goldmedal commented on issue #10838: URL: https://github.com/apache/datafusion/issues/10838#issuecomment-2156610120 > We can add `self.args` to `AccumulatorArgs` and compute `percentile` and `tdigest max size` in `accumulator`. > Thanks for the suggestion. I can make a PR after #10

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632298786 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing A

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632296297 ## datafusion/sqllogictest/test_files/tpch/q1.slt.part: ## @@ -41,8 +41,8 @@ explain select logical_plan 01)Sort: lineitem.l_returnflag ASC NULLS L

Re: [I] chore: Make Comet compatible with Rust 1.78 [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on issue #507: URL: https://github.com/apache/datafusion-comet/issues/507#issuecomment-2156614044 These are the failing tests: ``` test execution::datafusion::spark_hash::tests::test_str ... thread caused non-unwinding panic. aborting. test execution::sort::

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
peter-toth commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632307253 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing Alias

[PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove opened a new pull request, #546: URL: https://github.com/apache/datafusion-comet/pull/546 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/507 ## Rationale for this change Rust 1.78 has better debug asse

Re: [PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10841: URL: https://github.com/apache/datafusion/pull/10841#discussion_r1632309893 ## dev/changelog/39.0.0.md: ## @@ -1,23 +1,25 @@ - -## [39.0.0](https://github.com/apache/datafusion/tree/39.0.0) (2024-06-07) +# Apache DataFusion 39.0.0 Changelo

Re: [PR] Make Logical Plans more readable by removing extra aliases [datafusion]

2024-06-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #10832: URL: https://github.com/apache/datafusion/pull/10832#discussion_r1632311422 ## datafusion/expr/src/expr.rs: ## @@ -1133,14 +1133,23 @@ impl Expr { } /// Return `self AS name` alias expression +/// Removes existing A

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632319699 ## core/src/parquet/util/hash_util.rs: ## @@ -106,16 +108,12 @@ unsafe fn crc32_hash(bytes: &[u8], seed: u32) -> u32 { let num_words = num_bytes / u32_num

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632321751 ## core/src/parquet/util/hash_util.rs: ## @@ -45,17 +47,19 @@ const MURMUR_R: i32 = 47; unsafe fn murmur_hash2_64a(data_bytes: &[u8], seed: u64) -> u64 {

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632321812 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -78,8 +78,12 @@ pub(crate) fn spark_compatible_murmur3_hash>(data: T, seed: u32) } h1

Re: [PR] Upgrade to Rust 1.78 and fix issues in unsafe code [datafusion-comet]

2024-06-09 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1632322840 ## core/src/execution/sort.rs: ## @@ -159,12 +159,14 @@ where pos += 1; } } else { -unsafe {

Re: [PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
alamb commented on PR #10841: URL: https://github.com/apache/datafusion/pull/10841#issuecomment-2156671355 Let's merge this and iterate on the main branch. Thanks again @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] chore: Improve change log generator [datafusion]

2024-06-09 Thread via GitHub
alamb merged PR #10841: URL: https://github.com/apache/datafusion/pull/10841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support user defined `ParquetAccessPlan` in `ParquetExec`, validation to `ParquetAccessPlan::select` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10813: URL: https://github.com/apache/datafusion/pull/10813#discussion_r1632342300 ## datafusion/core/src/datasource/physical_plan/parquet/opener.rs: ## @@ -212,3 +213,34 @@ impl FileOpener for ParquetOpener { })) } } + +/// Return t

Re: [I] Add `read_parquet` SQL UDF [datafusion]

2024-06-09 Thread via GitHub
edmondop commented on issue #3773: URL: https://github.com/apache/datafusion/issues/3773#issuecomment-2156689753 In Databricks SQL dialect, they support ```sql SELECT * from json.`s3:/foo/bar/table/ ``` -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally) [datafusion]

2024-06-09 Thread via GitHub
edmondop commented on issue #4850: URL: https://github.com/apache/datafusion/issues/4850#issuecomment-2156694846 I wanted to confirm I understood correctly what options we are picking. It seems to me the following are viable: 1. Support "implicit" tables via select from (moving the change

Re: [PR] fix: null character not permitted in chr function [datafusion-comet]

2024-06-09 Thread via GitHub
codecov-commenter commented on PR #513: URL: https://github.com/apache/datafusion-comet/pull/513#issuecomment-2156709695 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/513?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [I] API in ParquetExec to pass in RowSelections to `ParquetExec` (enable custom indexes, finer grained pushdown) [datafusion]

2024-06-09 Thread via GitHub
alamb closed issue #9929: API in ParquetExec to pass in RowSelections to `ParquetExec` (enable custom indexes, finer grained pushdown) URL: https://github.com/apache/datafusion/issues/9929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Support user defined `ParquetAccessPlan` in `ParquetExec`, validation to `ParquetAccessPlan::select` [datafusion]

2024-06-09 Thread via GitHub
alamb merged PR #10813: URL: https://github.com/apache/datafusion/pull/10813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support user defined `ParquetAccessPlan` in `ParquetExec`, validation to `ParquetAccessPlan::select` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on PR #10813: URL: https://github.com/apache/datafusion/pull/10813#issuecomment-2156767428 BTW I am happy to make additional corrections as follow on PRs if anyone has additional notes cc @advancedxy @thinkharderdev @crepererum @NGA-TRAN and @Ted-Jiang @xinlifoobar

Re: [PR] Add support for reading CSV files with comments [datafusion]

2024-06-09 Thread via GitHub
bbannier commented on PR #10467: URL: https://github.com/apache/datafusion/pull/10467#issuecomment-2156768164 This is now rebased on `main` which recently bumped arrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156778295 Thank you @Lordworms -- indeed it is a great job to find issues when testing I wonder if there is some way to get the integration test partly working and checked in somewhe

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156778859 > Thank you @Lordworms -- indeed it is a great job to find issues when testing > > I wonder if there is some way to get the integration test partly working and checked

Re: [I] Projects require unique expressions names error in substrait producer/consumer [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10815: URL: https://github.com/apache/datafusion/issues/10815#issuecomment-2156778917 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156778662 @alamb I'm currently thinking about how to integrate `StatisticsConverter` with the existing code `prune_pages_in_one_row_group`. This is what I originally had i

Re: [I] Substrait consumer doesn't respect final output column names [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10817: URL: https://github.com/apache/datafusion/issues/10817#issuecomment-2156778930 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Substrait serializer clippy error: not calling `truncate` [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #9727: URL: https://github.com/apache/datafusion/issues/9727#issuecomment-2156779002 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Complete substrait support for ParquetExec round trip [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #9347: URL: https://github.com/apache/datafusion/issues/9347#issuecomment-2156779039 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Can't serialize example `ExecutionPlan` to substrait [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #9299: URL: https://github.com/apache/datafusion/issues/9299#issuecomment-2156779111 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156779183 I added to the substrait support epic: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156779263 > actually, I have made tpch_1 work just now, I'll draft a PR, and hope to get a review from you, Thanks a lot. Wow -- nice work! -- This is an automated message from the

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
richtia commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156781036 Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for you as

[PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms opened a new pull request, #10842: URL: https://github.com/apache/datafusion/pull/10842 ## Which issue does this PR close? part of #10710 Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these

Re: [I] Convert `approx_distinct` to UDAF [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on issue #10837: URL: https://github.com/apache/datafusion/issues/10837#issuecomment-2156781960 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on code in PR #10842: URL: https://github.com/apache/datafusion/pull/10842#discussion_r1632389768 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1253,6 +1335,8 @@ fn from_substrait_type( r#type::Kind::Struct(s) => Ok(DataType::Stru

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156787656 Thank you @marvinlanhenke -- excellent analysis. > * a lot of work the StatisticConverter does is already done [here](https://github.com/apache/datafusion/blob/main/datafus

Re: [PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on PR #10842: URL: https://github.com/apache/datafusion/pull/10842#issuecomment-2156797223 another problem here is that the json file seems to do projection after aggregation https://github.com/apache/datafusion/assets/48054792/42213070-9f6d-4293-a0c1-5e3f54312391";>

[PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
alamb opened a new pull request, #10843: URL: https://github.com/apache/datafusion/pull/10843 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10806 ## Rationale for this change @marvinlanhenke asked some excellent questions on https://

Re: [PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10843: URL: https://github.com/apache/datafusion/pull/10843#discussion_r1632395830 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -766,10 +769,108 @@ impl<'a> StatisticsConverter<'a> { Ok(Arc::new(UInt64Array:

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
alamb commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156798952 @marvinlanhenke -- I whipped up something (actually I had been playing with it yesterday) https://github.com/apache/datafusion/pull/10843 -- This is an automated message from t

Re: [PR] WIP Prototype DataPage extraction API [datafusion]

2024-06-09 Thread via GitHub
alamb commented on code in PR #10843: URL: https://github.com/apache/datafusion/pull/10843#discussion_r1632396117 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -1724,7 +1808,7 @@ async fn test_boolean() { .build() .await; -Test { +TestBoth { Re

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156804973 > Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for yo

Re: [I] Integrate with the substrait integration test [datafusion]

2024-06-09 Thread via GitHub
richtia commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2156807602 > > Thanks for looking into these issues @Lordworms! If there are any changes you want to make in the substrait integration test repo itself, I'll gladly do those reviews for yo

Re: [PR] support tpch_1 consumer_producer_test [datafusion]

2024-06-09 Thread via GitHub
Lordworms commented on code in PR #10842: URL: https://github.com/apache/datafusion/pull/10842#discussion_r1632401139 ## datafusion/substrait/tests/testdata/tpch/lineitem.csv: ## @@ -0,0 +1,2 @@ +l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l

[PR] docs(variance): Correct typos in comments [datafusion]

2024-06-09 Thread via GitHub
pingsutw opened a new pull request, #10844: URL: https://github.com/apache/datafusion/pull/10844 ## Which issue does this PR close? Closes #. ## Rationale for this change Typo ## What changes are included in this PR? Correct typos in comments ## Are t

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-09 Thread via GitHub
marvinlanhenke commented on issue #10806: URL: https://github.com/apache/datafusion/issues/10806#issuecomment-2156816494 Thank you so much - I quickly skimmed the draft you uploaded (will take a closer look tomorrow). However, the main question should be answered, for now we are iterating o

Re: [PR] build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [datafusion]

2024-06-09 Thread via GitHub
abhiaagarwal commented on PR #10765: URL: https://github.com/apache/datafusion/pull/10765#issuecomment-2156823540 @waynexia I was inspecting the dependency tree for delta-rs and it seems the dependency didn't get bumped for object-store in `proto-common` fyi, so 0.9.1 gets pulled. ``

[I] [Proposal] DataFusion monthly online meetup [datafusion]

2024-06-09 Thread via GitHub
edmondop opened a new issue, #10845: URL: https://github.com/apache/datafusion/issues/10845 ### Is your feature request related to a problem or challenge? DataFusion is such an exciting project, with so many great contributors and so many users that are building useful systems on top

[I] API to add analyzer_rules [datafusion]

2024-06-09 Thread via GitHub
alamb opened a new issue, #10846: URL: https://github.com/apache/datafusion/issues/10846 ### Is your feature request related to a problem or challenge? @goldmedal reports on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1249052209388326922 Hi, I

[PR] Minor: Clarify `SessionContext::state` docs [datafusion]

2024-06-09 Thread via GitHub
alamb opened a new pull request, #10847: URL: https://github.com/apache/datafusion/pull/10847 ## Which issue does this PR close? N/A ## Rationale for this change @goldmedal reports on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1249052209388

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632437906 ## datafusion/physical-expr-common/Cargo.toml: ## @@ -37,6 +37,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-array = { workspa

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632439168 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -509,6 +509,10 @@ SELECT approx_median(c12) FROM aggregate_test_100 0.555006541052 +# csv_qu

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632439168 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -509,6 +509,10 @@ SELECT approx_median(c12) FROM aggregate_test_100 0.555006541052 +# csv_qu

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632439168 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -509,6 +509,10 @@ SELECT approx_median(c12) FROM aggregate_test_100 0.555006541052 +# csv_qu

Re: [I] API to add analyzer_rules [datafusion]

2024-06-09 Thread via GitHub
pingsutw commented on issue #10846: URL: https://github.com/apache/datafusion/issues/10846#issuecomment-2156924635 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] MINOR: use workspace deps in proto-common [datafusion]

2024-06-09 Thread via GitHub
waynexia opened a new pull request, #10848: URL: https://github.com/apache/datafusion/pull/10848 ## Which issue does this PR close? Resolve https://github.com/apache/datafusion/pull/10765#issuecomment-2156823540 ## Rationale for this change Some deps in `

Re: [PR] build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [datafusion]

2024-06-09 Thread via GitHub
waynexia commented on PR #10765: URL: https://github.com/apache/datafusion/pull/10765#issuecomment-2156967694 Sorry for causing this! Opened https://github.com/apache/datafusion/pull/10848 to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, p

[PR] feat: Add method to add analyzer rules to SessionContext [datafusion]

2024-06-09 Thread via GitHub
pingsutw opened a new pull request, #10849: URL: https://github.com/apache/datafusion/pull/10849 ## Which issue does this PR close? Closes #10846 ## Rationale for this change The new session context works fine with the rules I added, but there is no way to register the rules to

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-06-09 Thread via GitHub
erratic-pattern commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1632476923 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,420 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-06-09 Thread via GitHub
erratic-pattern commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1632477484 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[PR] Minor: add `Window::try_new_with_schema` constructor [datafusion]

2024-06-09 Thread via GitHub
sadboy opened a new pull request, #10850: URL: https://github.com/apache/datafusion/pull/10850 ## Which issue does this PR close? Closes N/A ## Rationale for this change This change makes the `Window` node API more uniform with other `LogicalPlan` nodes,

Re: [PR] Convert `approx_median` to UDAF [datafusion]

2024-06-09 Thread via GitHub
goldmedal commented on code in PR #10840: URL: https://github.com/apache/datafusion/pull/10840#discussion_r1632500900 ## datafusion/physical-expr-common/Cargo.toml: ## @@ -37,6 +37,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-array = { workspac

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1632502092 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Convert `VariancePopulation` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 commented on PR #10836: URL: https://github.com/apache/datafusion/pull/10836#issuecomment-2157080515 Thanks @mknaw and @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Convert `VariancePopulation` to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 merged PR #10836: URL: https://github.com/apache/datafusion/pull/10836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Convert Variance Population to UDAF [datafusion]

2024-06-09 Thread via GitHub
jayzhan211 closed issue #10668: Convert Variance Population to UDAF URL: https://github.com/apache/datafusion/issues/10668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] Convert approx_distinct to UDAF [datafusion]

2024-06-09 Thread via GitHub
Lordworms opened a new pull request, #10851: URL: https://github.com/apache/datafusion/pull/10851 ## Which issue does this PR close? Closes #10837 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] Implement `ScalarValue::TimestampMillisecond` -> String Support [datafusion]

2024-06-09 Thread via GitHub
pingsutw commented on issue #10797: URL: https://github.com/apache/datafusion/issues/10797#issuecomment-2157134523 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10` [datafusion]

2024-06-09 Thread via GitHub
abhiaagarwal commented on PR #10765: URL: https://github.com/apache/datafusion/pull/10765#issuecomment-2157141004 All good, just wanted to bring it to your attention before 39 formally drops :) -- This is an automated message from the Apache Git Service. To respond to the message, please

  1   2   >