Re: [PR] chore: Enable Comet shuffle with AQE coalesce partitions [datafusion-comet]

2024-08-15 Thread via GitHub
codecov-commenter commented on PR #834: URL: https://github.com/apache/datafusion-comet/pull/834#issuecomment-2292926512 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/834?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[PR] Fix: support NULL input for like operations [datafusion]

2024-08-15 Thread via GitHub
HuSen8891 opened a new pull request, #12025: URL: https://github.com/apache/datafusion/pull/12025 ## Which issue does this PR close? Closes #11872 ## Rationale for this change Support NULL input for like operations, such as select null like null; select null ilke null;

Re: [PR] chore: Enable Comet shuffle with AQE coalesce partitions [datafusion-comet]

2024-08-15 Thread via GitHub
viirya commented on PR #834: URL: https://github.com/apache/datafusion-comet/pull/834#issuecomment-2292870515 This is a copy of #651 but removed some changes. I'd like to see if "os cannot spawn new native thread" error could be removed. And I also want to check if core-1 can pass wit

[PR] chore: Enable Comet shuffle with AQE coalesce partitions [datafusion-comet]

2024-08-15 Thread via GitHub
viirya opened a new pull request, #834: URL: https://github.com/apache/datafusion-comet/pull/834 ## Which issue does this PR close? Closes #387. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] Improve performance of REPEAT functions [datafusion]

2024-08-15 Thread via GitHub
tlm365 commented on code in PR #12015: URL: https://github.com/apache/datafusion/pull/12015#discussion_r1719296602 ## datafusion/functions/benches/repeat.rs: ## @@ -0,0 +1,129 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agree

Re: [I] Implement cast from struct type to string [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on issue #814: URL: https://github.com/apache/datafusion-comet/issues/814#issuecomment-2292789745 > we're talking about this [PR](https://github.com/apache/datafusion-comet/pull/805/files#diff-1330a628b9d525dd0f5f24e21add2864a1680dfa018592d56d48e37516bba7af) right?

Re: [PR] Minor: Use execution error in ScalarValue::iter_to_array for incorrect usage [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 commented on PR #11999: URL: https://github.com/apache/datafusion/pull/11999#issuecomment-2292638992 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Minor: Use execution error in ScalarValue::iter_to_array for incorrect usage [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 merged PR #11999: URL: https://github.com/apache/datafusion/pull/11999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: Add map_extract module and function [datafusion]

2024-08-15 Thread via GitHub
Weijun-H commented on PR #11969: URL: https://github.com/apache/datafusion/pull/11969#issuecomment-2292636632 Changed to follow DuckDB now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1719221322 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -31,6 +31,13 @@ pub enum EmitTo { /// For example, if `n=10`, group_index `0, 1, ... 9` are emit

Re: [PR] Minor: Remove wrong comment on `Accumulator::evaluate` and `Accumulator::state` [datafusion]

2024-08-15 Thread via GitHub
lewiszlw commented on code in PR #12001: URL: https://github.com/apache/datafusion/pull/12001#discussion_r1719213051 ## datafusion/expr-common/src/accumulator.rs: ## @@ -64,9 +64,6 @@ pub trait Accumulator: Send + Sync + Debug { /// For example, the `SUM` accumulator mainta

Re: [PR] Improve performance of REPEAT functions [datafusion]

2024-08-15 Thread via GitHub
Omega359 commented on code in PR #12015: URL: https://github.com/apache/datafusion/pull/12015#discussion_r1719199153 ## datafusion/functions/benches/repeat.rs: ## @@ -0,0 +1,129 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [I] COUNT(expr) always returns the COUNT(colname) [datafusion]

2024-08-15 Thread via GitHub
alamb commented on issue #12023: URL: https://github.com/apache/datafusion/issues/12023#issuecomment-2292516325 Another classic trick if you are trying to find the number of rows where `a > 0` is to use SUM/CASE Something like like ```sql SELECT SUM(CASE WHEN a > 10 THEN

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-15 Thread via GitHub
zeroshade commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2292510485 > Surely this should just be a CLI that speaks arrow flight SQL? I've tried to persuade ClickHouse to adopt arrow flight SQL to get around their woeful Python client. A

Re: [PR] fix: Optimize not to call getNullCount as much as possible [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on PR #820: URL: https://github.com/apache/datafusion-comet/pull/820#issuecomment-2292508905 > Thanks @andygrove hmmm how many iteration was used for your local benchmark? > > I used q27. I will try to run some microbenchmarks to showcase... This was the ave

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on issue #708: URL: https://github.com/apache/datafusion-comet/issues/708#issuecomment-2292468613 I created a Google document to discuss the design. https://docs.google.com/document/d/1zNuavf_WT3IcpeTVAEC8IjMGloi1MeAeQSg4F0eEivs/edit?usp=sharing -- This is an a

Re: [I] COUNT(expr) always returns the COUNT(colname) [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 commented on issue #12023: URL: https://github.com/apache/datafusion/issues/12023#issuecomment-2292443315 > Perhaps just implementing [count_if](https://spark.apache.org/docs/latest/api/sql/index.html#count_if) would do the trick? It really make sense. DuckDB has `count_if

Re: [PR] Remove physical sort parameters on aggregate window functions [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 merged PR #12009: URL: https://github.com/apache/datafusion/pull/12009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] order_by not respected for window functions using udaf [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 closed issue #11981: order_by not respected for window functions using udaf URL: https://github.com/apache/datafusion/issues/11981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Remove physical sort parameters on aggregate window functions [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 commented on PR #12009: URL: https://github.com/apache/datafusion/pull/12009#issuecomment-2292430449 Thanks @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] fix: Optimize not to call getNullCount as much as possible [datafusion-comet]

2024-08-15 Thread via GitHub
kazuyukitanimura commented on PR #820: URL: https://github.com/apache/datafusion-comet/pull/820#issuecomment-2292426015 Thanks @andygrove hmmm how many iteration was used for your local benchmark? I used q27. I will try to run some microbenchmarks to showcase... -- This is a

Re: [I] COUNT(expr) always returns the COUNT(colname) [datafusion]

2024-08-15 Thread via GitHub
rtyler commented on issue #12023: URL: https://github.com/apache/datafusion/issues/12023#issuecomment-2292420531 Perhaps just implementing [count_if](https://spark.apache.org/docs/latest/api/sql/index.html#count_if) would do the trick? -- This is an automated message from the Apache Git

Re: [PR] fix: incorrect aggregation result of `bool_and` [datafusion]

2024-08-15 Thread via GitHub
alamb commented on code in PR #12017: URL: https://github.com/apache/datafusion/pull/12017#discussion_r1719087314 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/bool_op.rs: ## @@ -77,7 +82,9 @@ where if self.values.len() < total_num_groups {

Re: [I] COUNT(expr) always returns the COUNT(colname) [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 commented on issue #12023: URL: https://github.com/apache/datafusion/issues/12023#issuecomment-2292414939 I check the behaviour from Duckdb and Postgres, it is not the same as what Spark SQL is ``` D create table t(a int); D insert into t values(1); D insert into t

Re: [PR] Fix: sqllogictest - generate_series function invalid argument type [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #12002: URL: https://github.com/apache/datafusion/pull/12002#issuecomment-2292413964 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from the A

Re: [PR] Fix: support NULL input for regular expression comparison operations [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #11985: URL: https://github.com/apache/datafusion/pull/11985#issuecomment-2292413109 Thanks again @HuSen8891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Fix: support NULL input for regular expression comparison operations [datafusion]

2024-08-15 Thread via GitHub
alamb merged PR #11985: URL: https://github.com/apache/datafusion/pull/11985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: make some physical-plan properties public [datafusion]

2024-08-15 Thread via GitHub
alamb commented on code in PR #12022: URL: https://github.com/apache/datafusion/pull/12022#discussion_r1719085598 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -128,14 +128,14 @@ impl AggregateMode { #[derive(Clone, Debug, Default)] pub struct PhysicalGroupBy {

Re: [PR] Handle arguments checking of `min`/`max` function to avoid crashes [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #12016: URL: https://github.com/apache/datafusion/pull/12016#issuecomment-2292411394 I couldn't help myself -- we should have a regression test for this code so I added one in https://github.com/apache/datafusion/pull/12024 -- This is an automated message from the Ap

Re: [PR] Minor: Add error tests for min/max with 2 arguments [datafusion]

2024-08-15 Thread via GitHub
alamb commented on code in PR #12024: URL: https://github.com/apache/datafusion/pull/12024#discussion_r1719085014 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -1881,6 +1881,12 @@ SELECT MIN(c1), MIN(c2) FROM test 0 1 +query error min/max was called with 2

[PR] Minor: Add error tests for min/max with 2 arguments [datafusion]

2024-08-15 Thread via GitHub
alamb opened a new pull request, #12024: URL: https://github.com/apache/datafusion/pull/12024 ## Which issue does this PR close? Related to https://github.com/apache/datafusion/issues/12011 ## Rationale for this change Add test for check in https://github.com/apache/dataf

Re: [PR] Handle arguments checking of `min`/`max` function to avoid crashes [datafusion]

2024-08-15 Thread via GitHub
alamb merged PR #12016: URL: https://github.com/apache/datafusion/pull/12016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Update SPLIT_PART scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #11975: URL: https://github.com/apache/datafusion/pull/11975#issuecomment-2292405602 Thanks again @Lordworms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Panics in `MIN()/MAX()` aggregate functions (SQLancer) [datafusion]

2024-08-15 Thread via GitHub
alamb closed issue #12011: Panics in `MIN()/MAX()` aggregate functions (SQLancer) URL: https://github.com/apache/datafusion/issues/12011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Update `SPLIT_PART` scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
alamb closed issue #11950: Update `SPLIT_PART` scalar function to support Utf8View URL: https://github.com/apache/datafusion/issues/11950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Update SPLIT_PART scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
alamb merged PR #11975: URL: https://github.com/apache/datafusion/pull/11975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Update `TRANSLATE` scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
alamb closed issue #11953: Update `TRANSLATE` scalar function to support Utf8View URL: https://github.com/apache/datafusion/issues/11953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat/11953: Support StringView for TRANSLATE() fn [datafusion]

2024-08-15 Thread via GitHub
alamb merged PR #11967: URL: https://github.com/apache/datafusion/pull/11967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support partial aggregation skip for boolean functions [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #11847: URL: https://github.com/apache/datafusion/pull/11847#issuecomment-2292405317 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Improve performance of boolean aggregates: implement `convert_to_state` [datafusion]

2024-08-15 Thread via GitHub
alamb closed issue #11818: Improve performance of boolean aggregates: implement `convert_to_state` URL: https://github.com/apache/datafusion/issues/11818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Support partial aggregation skip for boolean functions [datafusion]

2024-08-15 Thread via GitHub
alamb merged PR #11847: URL: https://github.com/apache/datafusion/pull/11847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
alamb merged PR #11973: URL: https://github.com/apache/datafusion/pull/11973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] COUNT(expr) always returns the COUNT(colname) [datafusion]

2024-08-15 Thread via GitHub
devanbenz commented on issue #12023: URL: https://github.com/apache/datafusion/issues/12023#issuecomment-2292402801 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Update `REVERSE` scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
alamb closed issue #11915: Update `REVERSE` scalar function to support Utf8View URL: https://github.com/apache/datafusion/issues/11915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #11973: URL: https://github.com/apache/datafusion/pull/11973#issuecomment-2292402846 THanks again @Omega359 and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-15 Thread via GitHub
alamb commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2292401779 > Where I disagree (I think) with @alamb is around what it lacks — my main problem with datafusion-cli is the "UI" — all the small parts of the best CLIs which make the difference

Re: [PR] fix: Optimize not to call getNullCount as much as possible [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on PR #820: URL: https://github.com/apache/datafusion-comet/pull/820#issuecomment-2292389872 I ran my local TPC-DS benchmark and it doesn't show any improvement for that benchmark. ![tpcds_allqueries (4)](https://github.com/user-attachments/assets/99ff1cdc-2b2d-4

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-15 Thread via GitHub
alamb commented on code in PR #11938: URL: https://github.com/apache/datafusion/pull/11938#discussion_r1719066241 ## datafusion/core/src/datasource/flight/sql.rs: ## @@ -0,0 +1,475 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [I] COUNT(expr) always returns the COUNT(colname) [datafusion]

2024-08-15 Thread via GitHub
rtyler commented on issue #12023: URL: https://github.com/apache/datafusion/issues/12023#issuecomment-2292339780 Related to: * #5619 * #5908 * #11303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] fix: Optimize not to call getNullCount as much as possible [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on PR #820: URL: https://github.com/apache/datafusion-comet/pull/820#issuecomment-2292277206 I am now running benchmarks with this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-15 Thread via GitHub
findepi commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2292268818 > One last thing I'll say — virtually all of the above features would be independent of the query engine being used, so there is a good argument to build one great CLI user expe

[PR] Minor: make some physical-plan properties public [datafusion]

2024-08-15 Thread via GitHub
emgeee opened a new pull request, #12022: URL: https://github.com/apache/datafusion/pull/12022 ## Which issue does this PR close? Closes #12021 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

[I] Make properties in phyiscal-plan/aggregates public [datafusion]

2024-08-15 Thread via GitHub
emgeee opened a new issue, #12021: URL: https://github.com/apache/datafusion/issues/12021 ### Is your feature request related to a problem or challenge? We're working on building out streaming functionality datafusion via the [Denormalized](https://github.com/probably-nothing-labs/den

[I] Unable to access LogicalPlanBuilder.plan inside a new trail implementation [datafusion]

2024-08-15 Thread via GitHub
emgeee opened a new issue, #12020: URL: https://github.com/apache/datafusion/issues/12020 ### Is your feature request related to a problem or challenge? We're looking to extend the `LogicalPlanBuilder` via a trait, however we need to be able to access `LogicalPlanBuilder.plan` which i

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-15 Thread via GitHub
samuelcolvin commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2292143867 > I like the idea to have a cli frontend that is query engine agnostic (datafusion, duckdb), table agnostic (iceberg, delta), file agnostic (parquet, lance), and even more.

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
Omega359 commented on code in PR #11973: URL: https://github.com/apache/datafusion/pull/11973#discussion_r1718923386 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -95,59 +104,58 @@ pub fn reverse(args: &[ArrayRef]) -> Result { #[cfg(test)] mod tests { -use arrow

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-15 Thread via GitHub
Rachelint commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2292090614 > Hi @Rachelint -- please let me know if/when this PR is ready for another look. I think your plan as I understand it is to get this idea working enough to show performance improve

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2292059477 Hi @Rachelint -- please let me know if/when this PR is ready for another look. I think your plan as I understand it is to get this idea working enough to show performance improvements

[PR] Update SUBSTR scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
dmitrybugakov opened a new pull request, #12019: URL: https://github.com/apache/datafusion/pull/12019 ## Which issue does this PR close? Closes #11952. ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [PR] chore: Enable Comet shuffle with AQE coalesce partitions [datafusion-comet]

2024-08-15 Thread via GitHub
viirya commented on PR #651: URL: https://github.com/apache/datafusion-comet/pull/651#issuecomment-2292024903 For the negative refcount issue reported by Java Arrow, I cannot reproduce it locally or even on internal CI (different platform other than ubuntu). I'm not sure if it is rela

Re: [PR] Improve documentation about `ParquetExec` / Parquet predicate pushdown [datafusion]

2024-08-15 Thread via GitHub
itsjunetime commented on code in PR #11994: URL: https://github.com/apache/datafusion/pull/11994#discussion_r1718844856 ## datafusion/core/src/datasource/physical_plan/parquet/row_filter.rs: ## @@ -243,13 +295,17 @@ impl<'a> TreeNodeRewriter for FilterCandidateBuilder<'a> {

Re: [PR] Skeleton `concat_elements` implementation for StringViewArray [datafusion]

2024-08-15 Thread via GitHub
dharanad commented on PR #11995: URL: https://github.com/apache/datafusion/pull/11995#issuecomment-2291929699 Thank you for this @alamb , i will use it implement my solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-15 Thread via GitHub
devanbenz commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2291900342 @timsaucer I have a fix out for this specific issue: https://github.com/apache/datafusion/pull/12000 To be perfectly honest I don't know if this will cause any downstream a

Re: [PR] Keep the existing default catalog for `SessionStateBuilder::new_from_existing` [datafusion]

2024-08-15 Thread via GitHub
goldmedal commented on PR #11991: URL: https://github.com/apache/datafusion/pull/11991#issuecomment-2291842114 Thanks @alamb ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] Minor: Remove warning when building `datafusion-cli` from `Dockerfile` [datafusion]

2024-08-15 Thread via GitHub
tlm365 opened a new pull request, #12018: URL: https://github.com/apache/datafusion/pull/12018 ## Which issue does this PR close? Closes #. ## Rationale for this change Remove the WARNING log when build `datafusion-cli` > 1 warning found (use docker --debug to

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718747032 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -441,20 +459,20 @@ object CometConf extends ShimCometConf { .booleanConf .createWith

Re: [PR] [draft] implement utf8_view for replace [datafusion]

2024-08-15 Thread via GitHub
thinh2 commented on code in PR #12004: URL: https://github.com/apache/datafusion/pull/12004#discussion_r1718736487 ## datafusion/sqllogictest/test_files/string_view.slt: ## @@ -718,9 +718,8 @@ EXPLAIN SELECT FROM test; logical_plan -01)Projection: replace(__common_expr_1

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718733137 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -84,15 +84,33 @@ object CometConf extends ShimCometConf { .booleanConf .createWithDe

Re: [PR] Use tracked-consumers memory pool be the default. [datafusion]

2024-08-15 Thread via GitHub
alamb commented on PR #11949: URL: https://github.com/apache/datafusion/pull/11949#issuecomment-2291717421 Thanks again @wiedld and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Resources exhausted errors are confusing return the biggest memory consumers. [datafusion]

2024-08-15 Thread via GitHub
alamb closed issue #11523: Resources exhausted errors are confusing return the biggest memory consumers. URL: https://github.com/apache/datafusion/issues/11523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Use tracked-consumers memory pool be the default. [datafusion]

2024-08-15 Thread via GitHub
alamb merged PR #11949: URL: https://github.com/apache/datafusion/pull/11949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve performance of REPEAT functions [datafusion]

2024-08-15 Thread via GitHub
tlm365 commented on PR #12015: URL: https://github.com/apache/datafusion/pull/12015#issuecomment-2291695743 > Thanks @tlm365 this PR looks promising. Would you mind adding bench results to the PR description? @comphead Thanks for reviewing, I have updated it. -- This is an automate

Re: [I] Intermittent failures in `fuzz_cases::join_fuzz::test_anti_join_1k_filtered` [datafusion]

2024-08-15 Thread via GitHub
korowa commented on issue #11555: URL: https://github.com/apache/datafusion/issues/11555#issuecomment-2291675108 From what I remember -- doesn't SMJ already fetches buffered side until it meets the first key which is non-equal to the current streamed side value (`PollingRest` buffered state

[PR] fix: incorrect aggregation result of `bool_and` [datafusion]

2024-08-15 Thread via GitHub
jonahgao opened a new pull request, #12017: URL: https://github.com/apache/datafusion/pull/12017 ## Which issue does this PR close? Closes #11846. ## Rationale for this change The initial value of each group was set to `false`. When encountering the first non-null va

Re: [I] Support different aggregate expression (outside SELECT list) in ORDER BY list [datafusion]

2024-08-15 Thread via GitHub
Rachelint commented on issue #12007: URL: https://github.com/apache/datafusion/issues/12007#issuecomment-2291539904 Interesting, I want to try it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Support different aggregate expression (outside SELECT list) in ORDER BY list [datafusion]

2024-08-15 Thread via GitHub
Rachelint commented on issue #12007: URL: https://github.com/apache/datafusion/issues/12007#issuecomment-2291540042 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Test CometDriverPlugin in Kube/Yarn [datafusion-comet]

2024-08-15 Thread via GitHub
comphead commented on issue #826: URL: https://github.com/apache/datafusion-comet/issues/826#issuecomment-2291528914 Historical server has slightly other implementation. Great for the help @orthoxerox I'm planning to document this today -- This is an automated message from the Apache

Re: [PR] Improve documentation about `ParquetExec` / Parquet predicate pushdown [datafusion]

2024-08-15 Thread via GitHub
comphead commented on code in PR #11994: URL: https://github.com/apache/datafusion/pull/11994#discussion_r1718558273 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -144,6 +142,29 @@ pub use writer::plan_to_parquet; /// * User provided [`ParquetAccessPlan

Re: [PR] Improve documentation about `ParquetExec` / Parquet predicate pushdown [datafusion]

2024-08-15 Thread via GitHub
comphead commented on code in PR #11994: URL: https://github.com/apache/datafusion/pull/11994#discussion_r1718555045 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -116,13 +116,12 @@ pub use writer::plan_to_parquet; /// /// Supports the following optimiz

Re: [PR] Improve documentation about `ParquetExec` / Parquet predicate pushdown [datafusion]

2024-08-15 Thread via GitHub
comphead commented on code in PR #11994: URL: https://github.com/apache/datafusion/pull/11994#discussion_r1718554136 ## datafusion/common/src/tree_node.rs: ## @@ -486,6 +486,9 @@ pub trait TreeNodeVisitor<'n>: Sized { /// A [Visitor](https://en.wikipedia.org/wiki/Visitor_patter

[PR] Handle arguments checking of `min`/`max` function to avoid crashes [datafusion]

2024-08-15 Thread via GitHub
tlm365 opened a new pull request, #12016: URL: https://github.com/apache/datafusion/pull/12016 ## Which issue does this PR close? Closes #12011. ## Rationale for this change Handles `min`/`max` args checking and throw error instead of crash. ## What changes

Re: [PR] Minor: Remove wrong comment on `Accumulator::evaluate` and `Accumulator::state` [datafusion]

2024-08-15 Thread via GitHub
comphead commented on code in PR #12001: URL: https://github.com/apache/datafusion/pull/12001#discussion_r1718552128 ## datafusion/expr-common/src/accumulator.rs: ## @@ -64,9 +64,6 @@ pub trait Accumulator: Send + Sync + Debug { /// For example, the `SUM` accumulator mainta

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-15 Thread via GitHub
comphead commented on code in PR #11973: URL: https://github.com/apache/datafusion/pull/11973#discussion_r1718547342 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -95,59 +104,58 @@ pub fn reverse(args: &[ArrayRef]) -> Result { #[cfg(test)] mod tests { -use arrow

Re: [PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-15 Thread via GitHub
joroKr21 commented on code in PR #11977: URL: https://github.com/apache/datafusion/pull/11977#discussion_r1718539993 ## datafusion/expr/src/expr.rs: ## @@ -2386,263 +2386,13 @@ fn fmt_function( write!(f, "{}({}{})", fun, distinct_str, args.join(", ")) } -pub fn create_fu

Re: [PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-15 Thread via GitHub
viirya commented on PR #11977: URL: https://github.com/apache/datafusion/pull/11977#issuecomment-2291484894 > Maybe @andygrove / @viirya could also provide some feedback -- I don't understand the complexities of the naming requirements in comet > > If there are some other non obvious

Re: [PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-15 Thread via GitHub
viirya commented on PR #11977: URL: https://github.com/apache/datafusion/pull/11977#issuecomment-2291480960 > This was previously the purpose of the is_first_expression flag in create_physical_name. I don't see the appearance of `is_first_expression` flag in the diff. So the descript

Re: [PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-15 Thread via GitHub
viirya commented on code in PR #11977: URL: https://github.com/apache/datafusion/pull/11977#discussion_r1718531323 ## datafusion/expr/src/expr.rs: ## @@ -2386,263 +2386,13 @@ fn fmt_function( write!(f, "{}({}{})", fun, distinct_str, args.join(", ")) } -pub fn create_func

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
Kimahriman commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718513726 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -441,20 +459,20 @@ object CometConf extends ShimCometConf { .booleanConf .createWit

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
viirya commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718510724 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1131,12 +1133,39 @@ object CometSparkSessionExtensions extends Logging { /

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
viirya commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718509119 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1131,12 +1133,39 @@ object CometSparkSessionExtensions extends Logging { /

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
viirya commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718508825 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1131,12 +1133,39 @@ object CometSparkSessionExtensions extends Logging { /

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
viirya commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718506539 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -84,15 +84,33 @@ object CometConf extends ShimCometConf { .booleanConf .createWithDefau

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
Kimahriman commented on PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#issuecomment-2291414466 The other alternative in that case would be similar to the useV1SourceList, have a single config with a list of formats to do it for, but don't have a strong opinion either way

Re: [PR] Minor: Use execution error in ScalarValue::iter_to_array for incorrect usage [datafusion]

2024-08-15 Thread via GitHub
jayzhan211 commented on PR #11999: URL: https://github.com/apache/datafusion/pull/11999#issuecomment-2291402746 > I personally think the internal error makes more sense but I don't have a problem changing this I think the idea about returning internal error was that if these errors get h

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#issuecomment-2291373197 > Hmmm do you need separate configs per type? Or should there just be a dedicated like "convert from non-supported scan" config that covers all input formats? That would p

Re: [I] Implement more efficient conversion from Spark column to Comet/Arrow column [datafusion-comet]

2024-08-15 Thread via GitHub
andygrove commented on issue #798: URL: https://github.com/apache/datafusion-comet/issues/798#issuecomment-2291353585 I am going to take a first pass at this to see what effort is involved. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat: Add specific configs for converting Spark Parquet and JSON data to Arrow [datafusion-comet]

2024-08-15 Thread via GitHub
Kimahriman commented on PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#issuecomment-2291352973 Hmmm do you need separate configs per type? Or should there just be a dedicated like "convert from non-supported scan" config that covers all input formats? -- This is an aut

Re: [I] Allow suppling a table schema to ParquetExec [datafusion]

2024-08-15 Thread via GitHub
samuelcolvin commented on issue #12010: URL: https://github.com/apache/datafusion/issues/12010#issuecomment-2291344987 It's worth noting that allow the position of partition columns to be controlled would be useful beyond this problem: If I have `industry, company, department, employe

Re: [I] Panics in `MIN()/MAX()` aggregate functions (SQLancer) [datafusion]

2024-08-15 Thread via GitHub
2010YOUY01 commented on issue #12011: URL: https://github.com/apache/datafusion/issues/12011#issuecomment-2291318848 > Both postgres and duckdb does not support min/max more than one argument. I suggest we leave it alone too. I have updated the issue for clarification (expecting an er

Re: [PR] feat: Add map_extract module and function [datafusion]

2024-08-15 Thread via GitHub
alamb commented on code in PR #11969: URL: https://github.com/apache/datafusion/pull/11969#discussion_r1718440041 ## datafusion/sqllogictest/test_files/map.slt: ## @@ -493,3 +509,68 @@ select cardinality(map([1, 2, 3], ['a', 'b', 'c'])), cardinality(MAP {'a': 1, 'b card

  1   2   >