Re: [I] Implement equality `=` and inequality `<>` support for `StringView` [datafusion]

2024-06-15 Thread via GitHub
Weijun-H commented on issue #10919: URL: https://github.com/apache/datafusion/issues/10919#issuecomment-2169307386 This issue must wait until #10920 because there is currently no convenient way to create a `StringViewArray` in Datafusion. If I am mistaken, please correct me. -- This is a

Re: [PR] [RFC] Register scalars with boxed fn impl [datafusion]

2024-06-15 Thread via GitHub
jayzhan211 closed pull request #9980: [RFC] Register scalars with boxed fn impl URL: https://github.com/apache/datafusion/pull/9980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] CSE shorthand alias [datafusion]

2024-06-15 Thread via GitHub
peter-toth commented on PR #10868: URL: https://github.com/apache/datafusion/pull/10868#issuecomment-2169716461 > The failing CI is a simple `clippy` warning, I'd appreciate if it can be fixed before merging. > > The last thread between me and @peter-toth mentions some possible impro

Re: [PR] CSE shorthand alias [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10868: URL: https://github.com/apache/datafusion/pull/10868#issuecomment-2170001838 > cc @alamb, as this PR might conflict with your #10835 Thanks for the heads up @peter-toth -- I can handle any conflicts if they arise. I am still trying to get improved perfor

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-15 Thread via GitHub
goldmedal commented on code in PR #10917: URL: https://github.com/apache/datafusion/pull/10917#discussion_r1641265848 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -496,11 +496,14 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode {

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-15 Thread via GitHub
alamb commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1641267493 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -39,102 +40,102 @@ use arrow_array::{ use arrow_schema::{DataType, Field, Schema}; use datafusion::data

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10852: URL: https://github.com/apache/datafusion/pull/10852#issuecomment-2170040731 > Thanks for your help on the test. I think its nearly ready, left a question about the test setup. Unfortunately I'm pretty busy this weekend; so it might take till monday before I ca

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10852: URL: https://github.com/apache/datafusion/pull/10852#issuecomment-2170041452 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-15 Thread via GitHub
alamb merged PR #10852: URL: https://github.com/apache/datafusion/pull/10852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Efficiently and correctly Extract Page Index statistics into `ArrayRef`s [datafusion]

2024-06-15 Thread via GitHub
alamb closed issue #10806: Efficiently and correctly Extract Page Index statistics into `ArrayRef`s URL: https://github.com/apache/datafusion/issues/10806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[PR] Add initial support for view types [datafusion]

2024-06-15 Thread via GitHub
XiangpengHao opened a new pull request, #10925: URL: https://github.com/apache/datafusion/pull/10925 ## Which issue does this PR close? Closes #10920 . But we need to wait https://github.com/apache/arrow-rs/pull/5894 to merge and propagate to DataFusion, otherwise we ge

Re: [PR] Add `advanced_parquet_index.rs` example of index in into parquet files [datafusion]

2024-06-15 Thread via GitHub
alamb commented on code in PR #10701: URL: https://github.com/apache/datafusion/pull/10701#discussion_r1641270399 ## datafusion-examples/examples/advanced_parquet_index.rs: ## @@ -0,0 +1,656 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [I] use StringViewArray when reading String columns from Parquet [datafusion]

2024-06-15 Thread via GitHub
XiangpengHao commented on issue #10921: URL: https://github.com/apache/datafusion/issues/10921#issuecomment-2170079350 I'll take this one, can you assign me @alamb ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Update ListingTable to use StatisticsConverter [datafusion]

2024-06-15 Thread via GitHub
alamb commented on code in PR #10924: URL: https://github.com/apache/datafusion/pull/10924#discussion_r1641271813 ## datafusion/sqllogictest/test_files/explain.slt: ## @@ -287,20 +287,20 @@ query TT EXPLAIN SELECT * FROM alltypes_plain limit 10; physical_plan -01)GlobalL

[I] `StatisticsConverter::row_group_null_counts` incorrect for missing column [datafusion]

2024-06-15 Thread via GitHub
alamb opened a new issue, #10926: URL: https://github.com/apache/datafusion/issues/10926 ### Describe the bug I noticed this while working on https://github.com/apache/datafusion/pull/10852 with @marvinlanhenke Basially, when generating statistics for a non existent column, th

Re: [PR] Stop copying LogicalPlan and Exprs in `CommonSubexprEliminate` [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10835: URL: https://github.com/apache/datafusion/pull/10835#issuecomment-2170122881 Update -- q1 gets significantly worse -- I'll try and profile it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[PR] Minor: Improve `arrow_statistics` tests [datafusion]

2024-06-15 Thread via GitHub
alamb opened a new pull request, #10927: URL: https://github.com/apache/datafusion/pull/10927 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10922 ## Rationale for this change I thought of a way to make the tests easier to write --

Re: [PR] Minor: Improve `arrow_statistics` tests [datafusion]

2024-06-15 Thread via GitHub
alamb commented on code in PR #10927: URL: https://github.com/apache/datafusion/pull/10927#discussion_r1641278953 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -164,6 +164,36 @@ impl TestReader { } } +/// Which statistics should we check? Review Comment:

[I] Support extracting `Int8`, `Int16`, `Int32` statistics from Parquet Datapages [datafusion]

2024-06-15 Thread via GitHub
alamb opened a new issue, #10928: URL: https://github.com/apache/datafusion/issues/10928 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10922 We are adding APIs to efficiently convert the data stored in Parquet'

Re: [I] Support extracting `Int8`, `Int16`, `Int32` statistics from Parquet Data Pages [datafusion]

2024-06-15 Thread via GitHub
alamb commented on issue #10928: URL: https://github.com/apache/datafusion/issues/10928#issuecomment-2170207810 FYI @marvinlanhenke -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Implement equality `=` and inequality `<>` support for `StringView` [datafusion]

2024-06-15 Thread via GitHub
alamb commented on issue #10919: URL: https://github.com/apache/datafusion/issues/10919#issuecomment-2170257967 > This issue must wait until #10920 because there is currently no convenient way to create a `StringViewArray` in Datafusion. If I am mistaken, please correct me. I think y

Re: [I] use StringViewArray when reading String columns from Parquet [datafusion]

2024-06-15 Thread via GitHub
alamb commented on issue #10921: URL: https://github.com/apache/datafusion/issues/10921#issuecomment-2170282429 > I'll take this one, can you assign me @alamb ? BTW you can assign yourself (single word comment `take`): https://datafusion.apache.org/contributor-guide/index.html#findin

[I] Do we need to escape search string as it's used in regexp? Wondering what's the result of `contains("abcdefg", ".*")` [datafusion]

2024-06-15 Thread via GitHub
alamb opened a new issue, #10929: URL: https://github.com/apache/datafusion/issues/10929 Do we need to escape search string as it's used in regexp? Wondering what's the result of `contains("abcdefg", ".*")` _Originally posted by @waynexia in https://github.com/apache/da

Re: [I] Do we need to escape search string as it's used in regexp? Wondering what's the result of `contains("abcdefg", ".*")` [datafusion]

2024-06-15 Thread via GitHub
alamb commented on issue #10929: URL: https://github.com/apache/datafusion/issues/10929#issuecomment-2170299510 cc @Lordworms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-15 Thread via GitHub
alamb commented on code in PR #10879: URL: https://github.com/apache/datafusion/pull/10879#discussion_r1641295837 ## datafusion/functions/src/string/contains.rs: ## @@ -0,0 +1,143 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10879: URL: https://github.com/apache/datafusion/pull/10879#issuecomment-2170301722 Thanks again @waynexia @Lordworms and @Weijun-H ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-15 Thread via GitHub
alamb merged PR #10879: URL: https://github.com/apache/datafusion/pull/10879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support for contains function in datafusion substrait consumer [datafusion]

2024-06-15 Thread via GitHub
alamb closed issue #10861: Support for contains function in datafusion substrait consumer URL: https://github.com/apache/datafusion/issues/10861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Feat: Implement hf:// / "hugging face" integration in datafusion-cli [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10792: URL: https://github.com/apache/datafusion/pull/10792#issuecomment-2170304471 > I would pause updating on this PR since it is extremely large IMO and difficult for reviewers. Let me know your thoughts on it and I could do an update in the following iterations.

Re: [PR] Feat: Implement hf:// / "hugging face" integration in datafusion-cli [datafusion]

2024-06-15 Thread via GitHub
alamb commented on code in PR #10792: URL: https://github.com/apache/datafusion/pull/10792#discussion_r1641296642 ## datafusion-cli/tests/data/hf_store_sql.txt: ## @@ -0,0 +1,9 @@ +select count(*) from "hf://datasets/cais/mmlu/astronomy/dev-0-of-1.parquet"; Review Comm

Re: [I] Add example for writing an `AnalyzerRule` [datafusion]

2024-06-15 Thread via GitHub
alamb commented on issue #10855: URL: https://github.com/apache/datafusion/issues/10855#issuecomment-2170328911 I have some time on a plane today that I may use to try and write up this example. I was inspired by some discussion I had this week -- This is an automated message from the Apa

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
comphead commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641301596 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,93 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
comphead commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641301735 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,93 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [I] Add example for writing an SQL analysis pass [datafusion]

2024-06-15 Thread via GitHub
alamb commented on issue #10871: URL: https://github.com/apache/datafusion/issues/10871#issuecomment-2170336713 Thank you @LorrensP-2158466 -- sorry for the delay . I have been traveling > I do have a question. > How can we show these results? Because Analyzer rules only return tr

Re: [PR] Only recompute schema in `TypeCoercion` when necessary [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10369: URL: https://github.com/apache/datafusion/pull/10369#issuecomment-2170341253 I realistically don't plan to spend any more time on this :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Only recompute schema in `TypeCoercion` when necessary [datafusion]

2024-06-15 Thread via GitHub
alamb closed pull request #10369: Only recompute schema in `TypeCoercion` when necessary URL: https://github.com/apache/datafusion/pull/10369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
comphead commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641303837 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,93 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [I] Row groups are read out of order or with completely different values [datafusion]

2024-06-15 Thread via GitHub
alamb commented on issue #10572: URL: https://github.com/apache/datafusion/issues/10572#issuecomment-2170404296 Hi @twitu -- I am very sorry for the delay in responding -- I have been traveling for sever > You'll see that the queries with ORDER BY have a Sort expression in the plan.

Re: [PR] Relax combine partial final rule [datafusion]

2024-06-15 Thread via GitHub
alamb commented on code in PR #10913: URL: https://github.com/apache/datafusion/pull/10913#discussion_r1641319524 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1382,18 +1382,17 @@ physical_plan 02)--AggregateExec: mode=Final, gby=[], aggr=[COUNT(alias1)] 03)Coale

Re: [I] Convert `BitAnd`, `BitOr`, `BitXor` to UDAF [datafusion]

2024-06-15 Thread via GitHub
dharanad commented on issue #10907: URL: https://github.com/apache/datafusion/issues/10907#issuecomment-2170413695 @jayzhan211 I'm really excited about getting involved with this project. This particular issue! Even though it's not marked as a 'good first issue,' I'm feeling confident and

Re: [PR] Fix `FormatOptions::CSV` propagation [datafusion]

2024-06-15 Thread via GitHub
alamb commented on PR #10912: URL: https://github.com/apache/datafusion/pull/10912#issuecomment-2170415873 Thank you so much for this contribution @svranesevic Can you perhaps add a test for this feature? Maybe in https://github.com/apache/datafusion/blob/main/datafusion/sqllogi

Re: [I] Add example for writing an SQL analysis pass [datafusion]

2024-06-15 Thread via GitHub
LorrensP-2158466 commented on issue #10871: URL: https://github.com/apache/datafusion/issues/10871#issuecomment-2170427006 Thanks for the reply! That's exactly what I have made, I'll open up a PR later today or tomorrow. -- This is an automated message from the Apache Git Service. T

[PR] remove bit and or xor from expr [datafusion]

2024-06-15 Thread via GitHub
dharanad opened a new pull request, #10930: URL: https://github.com/apache/datafusion/pull/10930 ## Which issue does this PR close? Closes #10907 and is part of #8708 ## Rationale for this change ## What changes are included in this PR? ## Are these chang

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-15 Thread via GitHub
goldmedal commented on PR #10917: URL: https://github.com/apache/datafusion/pull/10917#issuecomment-2170471336 It's really weird. I can't reproduce the CI failed in my local environment. https://github.com/apache/datafusion/actions/runs/9529829557/job/26268998329?pr=10917 ``` Extern

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
tshauck commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641365652 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,93 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
tshauck commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641375124 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,93 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
tshauck commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641404999 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,98 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
tshauck commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641404999 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,98 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

[PR] build(deps): bump regex-syntax from 0.8.3 to 0.8.4 [datafusion-python]

2024-06-15 Thread via GitHub
dependabot[bot] opened a new pull request, #732: URL: https://github.com/apache/datafusion-python/pull/732 Bumps [regex-syntax](https://github.com/rust-lang/regex) from 0.8.3 to 0.8.4. Commits https://github.com/rust-lang/regex/commit/4757b5f01a7b9b6c8d89bd63b3d1500f7e0efa9e";>4

Re: [PR] build(deps): bump regex-syntax from 0.8.2 to 0.8.3 [datafusion-python]

2024-06-15 Thread via GitHub
dependabot[bot] closed pull request #622: build(deps): bump regex-syntax from 0.8.2 to 0.8.3 URL: https://github.com/apache/datafusion-python/pull/622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] build(deps): bump regex-syntax from 0.8.2 to 0.8.3 [datafusion-python]

2024-06-15 Thread via GitHub
dependabot[bot] commented on PR #622: URL: https://github.com/apache/datafusion-python/pull/622#issuecomment-2170537477 Superseded by #732. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] build(deps): bump url from 2.5.0 to 2.5.1 [datafusion-python]

2024-06-15 Thread via GitHub
dependabot[bot] opened a new pull request, #733: URL: https://github.com/apache/datafusion-python/pull/733 Bumps [url](https://github.com/servo/rust-url) from 2.5.0 to 2.5.1. Commits https://github.com/servo/rust-url/commit/3d6dbbb1dfc64c597745d5d6b97f2a8dd543c42b";>3d6dbbb Rei

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
tshauck commented on PR #10890: URL: https://github.com/apache/datafusion/pull/10890#issuecomment-2170541525 @comphead, thanks for the feedback! -- Re-requesting a review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Relax combine partial final rule [datafusion]

2024-06-15 Thread via GitHub
ozankabak commented on code in PR #10913: URL: https://github.com/apache/datafusion/pull/10913#discussion_r1641427524 ## datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs: ## @@ -144,8 +144,12 @@ fn can_combine(final_agg: GroupExprsRef, partial_agg: GroupExprs

Re: [PR] Relax combine partial final rule [datafusion]

2024-06-15 Thread via GitHub
ozankabak commented on code in PR #10913: URL: https://github.com/apache/datafusion/pull/10913#discussion_r1641427524 ## datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs: ## @@ -144,8 +144,12 @@ fn can_combine(final_agg: GroupExprsRef, partial_agg: GroupExprs

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
tshauck commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641437900 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,93 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements

Re: [PR] Relax combine partial final rule [datafusion]

2024-06-15 Thread via GitHub
ozankabak commented on code in PR #10913: URL: https://github.com/apache/datafusion/pull/10913#discussion_r1641427524 ## datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs: ## @@ -144,8 +144,12 @@ fn can_combine(final_agg: GroupExprsRef, partial_agg: GroupExprs

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-15 Thread via GitHub
comphead commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1641550367 ## datafusion-cli/src/cli_context.rs: ## @@ -0,0 +1,98 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [I] Do we need to escape search string as it's used in regexp? Wondering what's the result of `contains("abcdefg", ".*")` [datafusion]

2024-06-15 Thread via GitHub
Lordworms commented on issue #10929: URL: https://github.com/apache/datafusion/issues/10929#issuecomment-2170946187 Sorry for the late review since I was busy this week. In the beginning, I was just trying to keep the same format as other ScalarUDF which utilize arrow-rs methods to implemen

Re: [I] Convert `BitAnd`, `BitOr`, `BitXor` to UDAF [datafusion]

2024-06-15 Thread via GitHub
dharanad commented on issue #10907: URL: https://github.com/apache/datafusion/issues/10907#issuecomment-2170956140 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Implement equality `=` and inequality `<>` support for `StringView` [datafusion]

2024-06-15 Thread via GitHub
XiangpengHao commented on issue #10919: URL: https://github.com/apache/datafusion/issues/10919#issuecomment-2170965292 Hi @Weijun-H , great to know you are working on this! I believe implementing this feature will eventually require https://github.com/apache/arrow-rs/issues/5897 to be so

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-15 Thread via GitHub
jayzhan211 commented on PR #10917: URL: https://github.com/apache/datafusion/pull/10917#issuecomment-2170996403 > Yes, it will be correct for the logical plan. However, I think it will cause some issues when processing the physical plan. Curiously, should we expect the physical roundtrip to

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-15 Thread via GitHub
jayzhan211 commented on PR #10917: URL: https://github.com/apache/datafusion/pull/10917#issuecomment-2170996602 > It's really weird. I can't reproduce the CI failed in my local environment. https://github.com/apache/datafusion/actions/runs/9529829557/job/26268998329?pr=10917 > > ```

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-15 Thread via GitHub
jayzhan211 commented on code in PR #10917: URL: https://github.com/apache/datafusion/pull/10917#discussion_r1640559776 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -15,19 +15,254 @@ // specific language governing permissions and limitations // under t

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-15 Thread via GitHub
jayzhan211 commented on code in PR #10917: URL: https://github.com/apache/datafusion/pull/10917#discussion_r1641592218 ## datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs: ## @@ -108,10 +156,8 @@ impl PartialEq for ApproxPercentileContWithWeight {

Re: [I] Support extracting `Int8`, `Int16`, `Int32` statistics from Parquet Data Pages [datafusion]

2024-06-15 Thread via GitHub
Weijun-H commented on issue #10928: URL: https://github.com/apache/datafusion/issues/10928#issuecomment-2171016238 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] feat: Add support for Int8 and Int16 data types in data page statistics [datafusion]

2024-06-15 Thread via GitHub
Weijun-H opened a new pull request, #10931: URL: https://github.com/apache/datafusion/pull/10931 ## Which issue does this PR close? Closes #10928 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] `StatisticsConverter::row_group_null_counts` incorrect for missing column [datafusion]

2024-06-15 Thread via GitHub
marvinlanhenke commented on issue #10926: URL: https://github.com/apache/datafusion/issues/10926#issuecomment-2171042437 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] chore: Improve performance of Parquet statistics conversion [datafusion]

2024-06-15 Thread via GitHub
Weijun-H opened a new pull request, #10932: URL: https://github.com/apache/datafusion/pull/10932 ## Which issue does this PR close? Closes #. ## Rationale for this change Refactor the code `get_statistics!` in `statistics.rs` to improve the performanc

Re: [I] `StatisticsConverter::row_group_null_counts` incorrect for missing column [datafusion]

2024-06-15 Thread via GitHub
marvinlanhenke commented on issue #10926: URL: https://github.com/apache/datafusion/issues/10926#issuecomment-2171055990 @alamb ...should we also change the signature and return `Result`? This however, would require to downcast_ref the result from `new_null_array` and clone the resul