[I] List available functions (`SHOW FUNCTIONS`) [datafusion]

2024-08-23 Thread via GitHub
findepi opened a new issue, #12144: URL: https://github.com/apache/datafusion/issues/12144 ### Is your feature request related to a problem or challenge? I as a user would want to see a list of available functions. ### Describe the solution you'd like ```sql SHOW FUNCT

Re: [PR] Check for overflow in substring with negative start [datafusion]

2024-08-23 Thread via GitHub
findepi commented on PR #12141: URL: https://github.com/apache/datafusion/pull/12141#issuecomment-2308147838 cc @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Check for overflow in substring with negative start [datafusion]

2024-08-23 Thread via GitHub
findepi commented on code in PR #12141: URL: https://github.com/apache/datafusion/pull/12141#discussion_r1729779836 ## datafusion/functions/src/unicode/substr.rs: ## @@ -144,19 +144,25 @@ where let result = iter .zip(start_array.iter())

Re: [I] Improve the hash join performance by replacing the RawTable to a simple Vec for JoinHashMap [datafusion]

2024-08-23 Thread via GitHub
Dandandan commented on issue #6910: URL: https://github.com/apache/datafusion/issues/6910#issuecomment-2308145924 I am going to experiment with a slight variation on this this weekend to reduce nr of collisions greatly while still using `Vec`-based indexing. -- This is an automated messag

Re: [PR] Remove LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 commented on PR #11978: URL: https://github.com/apache/datafusion/pull/11978#issuecomment-2308143053 Will `evaluate_as_scalar` replaces `evaluate` and `state_as_scalars` replaces `state`? If it is, then it looks good to me -- This is an automated message from the Apache Git Ser

Re: [PR] Remove LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 commented on PR #11978: URL: https://github.com/apache/datafusion/pull/11978#issuecomment-2308124082 > > UIpd: It seems there is only one place that calls Accumulator::evaluate and transform it to ArrayRef, I think we can just change the return value to ArrayRef for Accumulator::

Re: [I] Internal error in `approx_percentile_cont()` aggregate function (SQLancer) [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 closed issue #12012: Internal error in `approx_percentile_cont()` aggregate function (SQLancer) URL: https://github.com/apache/datafusion/issues/12012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Throw `not_impl_error` for `approx_percentile_cont` parameters validation [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 merged PR #12133: URL: https://github.com/apache/datafusion/pull/12133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Throw `not_impl_error` for `approx_percentile_cont` parameters validation [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 commented on PR #12133: URL: https://github.com/apache/datafusion/pull/12133#issuecomment-2308113865 Thanks @goldmedal @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Throw `not_impl_error` for `approx_percentile_cont` parameters validation [datafusion]

2024-08-23 Thread via GitHub
goldmedal commented on code in PR #12133: URL: https://github.com/apache/datafusion/pull/12133#discussion_r1729695268 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -154,19 +154,20 @@ fn get_scalar_value(expr: &Arc) -> Result { } fn validate_input_pe

Re: [I] Document "how to read an explain plan" [datafusion]

2024-08-23 Thread via GitHub
2010YOUY01 commented on issue #12088: URL: https://github.com/apache/datafusion/issues/12088#issuecomment-2308014959 > @2010YOUY01 I'm personally not very familiar with exchange based parallelism. Could you point me in the direction of a good paper/resource on the topic. Assuming `How Query

Re: [PR] minor: Add comments for `GroupedHashAggregateStream` struct [datafusion]

2024-08-23 Thread via GitHub
2010YOUY01 commented on code in PR #12127: URL: https://github.com/apache/datafusion/pull/12127#discussion_r1729673683 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -76,35 +76,43 @@ use super::AggregateExec; /// This encapsulates the spilling state struct Spi

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-23 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1729662155 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -367,6 +379,289 @@ impl VecAllocExt for Vec { } } +pub trait EmitToEx

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-23 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1729662155 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -367,6 +379,289 @@ impl VecAllocExt for Vec { } } +pub trait EmitToEx

Re: [PR] WIP: add documentation on `EXPLAIN PLAN` [datafusion]

2024-08-23 Thread via GitHub
devanbenz commented on code in PR #12122: URL: https://github.com/apache/datafusion/pull/12122#discussion_r1729658937 ## docs/source/user-guide/explain-usage.md: ## @@ -0,0 +1,423 @@ + + +# Reading Explain Plans + +## Introduction + +This section describes of how to read a Data

Re: [PR] Add input_nullable for UDAF args StateField and Accumulator [datafusion]

2024-08-23 Thread via GitHub
github-actions[bot] commented on PR #11063: URL: https://github.com/apache/datafusion/pull/11063#issuecomment-2307985987 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 merged PR #12097: URL: https://github.com/apache/datafusion/pull/12097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 commented on PR #12097: URL: https://github.com/apache/datafusion/pull/12097#issuecomment-2307975760 Thanks @alamb , I will try eq kernel and Scalar optimization as follow up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 commented on code in PR #12097: URL: https://github.com/apache/datafusion/pull/12097#discussion_r1729645104 ## datafusion/functions-nested/src/array_has.rs: ## @@ -251,75 +237,176 @@ impl ScalarUDFImpl for ArrayHasAny { } /// Represents the type of comparison for

Re: [PR] Use `LexRequirement` alias as much as possible [datafusion]

2024-08-23 Thread via GitHub
lewiszlw merged PR #12130: URL: https://github.com/apache/datafusion/pull/12130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Use `LexRequirement` alias as much as possible [datafusion]

2024-08-23 Thread via GitHub
lewiszlw commented on PR #12130: URL: https://github.com/apache/datafusion/pull/12130#issuecomment-2307961677 Making LexRequirement an actual strucM looks like a good idea. I'll try it in my free time. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Improve `CombinePartialFinalAggregate` code readability [datafusion]

2024-08-23 Thread via GitHub
lewiszlw merged PR #12128: URL: https://github.com/apache/datafusion/pull/12128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add ability to return `LogicalPlan` by value from `TableProvider` [datafusion]

2024-08-23 Thread via GitHub
jayzhan211 commented on code in PR #12113: URL: https://github.com/apache/datafusion/pull/12113#discussion_r1729630675 ## datafusion/optimizer/src/analyzer/inline_table_scan.rs: ## @@ -56,24 +56,22 @@ fn analyze_internal(plan: LogicalPlan) -> Result> { match plan {

Re: [I] Deterministic IDs for ExecutionPlan [datafusion]

2024-08-23 Thread via GitHub
ameyc commented on issue #11364: URL: https://github.com/apache/datafusion/issues/11364#issuecomment-2307954266 @ozankabak @alamb taking a look at this again as we are working on snapshotting and UI for Denoramlized. It seems to me that the ideal place to add these is in the `PlanProperties

[I] Deterministic IDs for ExecutionPlan [datafusion]

2024-08-23 Thread via GitHub
ameyc opened a new issue, #11364: URL: https://github.com/apache/datafusion/issues/11364 ### Is your feature request related to a problem or challenge? Currently execution plans do not have an id associated with them this makes comparison of metrics across the runs. Additionally we wo

[PR] fix concat dictionary(int32, utf8) bug [datafusion]

2024-08-23 Thread via GitHub
thinh2 opened a new pull request, #12143: URL: https://github.com/apache/datafusion/pull/12143 ## Which issue does this PR close? Closes #12101 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] error: failed to determine package fingerprint for build script [datafusion-comet]

2024-08-23 Thread via GitHub
andygrove commented on issue #867: URL: https://github.com/apache/datafusion-comet/issues/867#issuecomment-2307929742 @radhikabajaj123 What happens if you run the following command from the terminal? ```shell ls -l /Users/radhika.bajaj/Documents/datafusion-comet/native/proto ``

Re: [PR] Add maintenance status note [datafusion-ballista]

2024-08-23 Thread via GitHub
andygrove merged PR #1043: URL: https://github.com/apache/datafusion-ballista/pull/1043 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-23 Thread via GitHub
andygrove commented on code in PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#discussion_r1729597656 ## README.md: ## @@ -44,25 +44,25 @@ The following chart shows the time it takes to run the 22 TPC-H queries against using a single executor with 8 cores. Se

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-23 Thread via GitHub
andygrove commented on PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#issuecomment-2307916393 > Should we put TPCDS graphs as well? Yes, makes sense. I'll add it in the docs and link to them from the README to keep the README concise -- This is an automated messa

Re: [I] Intermittent failures in `fuzz_cases::join_fuzz::test_anti_join_1k_filtered` [datafusion]

2024-08-23 Thread via GitHub
comphead commented on issue #11555: URL: https://github.com/apache/datafusion/issues/11555#issuecomment-2307863276 I think this local test may cover lots of cases ``` #[tokio::test] async fn test_cross_1() { let left: Vec = make_staggered_batches(1); let left =

Re: [PR] WIP: add documentation on `EXPLAIN PLAN` [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12122: URL: https://github.com/apache/datafusion/pull/12122#discussion_r1729548903 ## docs/source/user-guide/explain-usage.md: ## @@ -0,0 +1,423 @@ + + +# Reading Explain Plans + +## Introduction + +This section describes of how to read a DataFusi

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-23 Thread via GitHub
kazuyukitanimura commented on PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#issuecomment-2307852429 Should we put TPCDS graphs as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-23 Thread via GitHub
comphead commented on code in PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#discussion_r1729541597 ## docs/source/user-guide/configs.md: ## @@ -40,7 +40,7 @@ Comet provides the following configuration settings. | spark.comet.exec.broadcastHashJoin.enabled |

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-23 Thread via GitHub
comphead commented on code in PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#discussion_r1729541396 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -101,7 +101,7 @@ object CometConf extends ShimCometConf { "config and this need to be

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-23 Thread via GitHub
comphead commented on code in PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#discussion_r1729539256 ## README.md: ## @@ -44,25 +44,25 @@ The following chart shows the time it takes to run the 22 TPC-H queries against using a single executor with 8 cores. See

Re: [PR] WIP: add documentation on `EXPLAIN PLAN` [datafusion]

2024-08-23 Thread via GitHub
alamb commented on PR #12122: URL: https://github.com/apache/datafusion/pull/12122#issuecomment-2307818424 I am going to try and document the analyze portion as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Handle downstream impacts to union's behavioral changes. [datafusion]

2024-08-23 Thread via GitHub
wiedld commented on issue #12105: URL: https://github.com/apache/datafusion/issues/12105#issuecomment-2307796449 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-23 Thread via GitHub
andygrove commented on code in PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#discussion_r1729506073 ## README.md: ## @@ -44,25 +44,25 @@ The following chart shows the time it takes to run the 22 TPC-H queries against using a single executor with 8 cores. Se

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-23 Thread via GitHub
wiedld commented on PR #12142: URL: https://github.com/apache/datafusion/pull/12142#issuecomment-2307792093 Note to @alamb -- exposing the rewriter api ([as suggested here](https://github.com/apache/datafusion/blob/b8b76bc225a9b0c51407261cc7b55770db1a958b/datafusion/optimizer/src/analyzer/ty

[PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-23 Thread via GitHub
wiedld opened a new pull request, #12142: URL: https://github.com/apache/datafusion/pull/12142 ## Which issue does this PR close? Closes #12105 ## Rationale for this change We construct our own logical plans for SQL-derivative languages (e.g. InfluxQL). The construction

Re: [PR] feat: support upper and lower for stringview [datafusion]

2024-08-23 Thread via GitHub
tshauck commented on code in PR #12138: URL: https://github.com/apache/datafusion/pull/12138#discussion_r1729485111 ## datafusion/functions/src/string/common.rs: ## @@ -214,6 +214,23 @@ where i64, _, >(array, op)?)), +Da

Re: [I] Implement `hf://` / "hugging face" integration in datafusion-cli [datafusion]

2024-08-23 Thread via GitHub
findepi commented on issue #10720: URL: https://github.com/apache/datafusion/issues/10720#issuecomment-2307768193 https://github.com/apache/datafusion/issues/11979 is probably related -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] FIx VALUES tuples type casts [datafusion]

2024-08-23 Thread via GitHub
findepi commented on code in PR #12104: URL: https://github.com/apache/datafusion/pull/12104#discussion_r1729479517 ## datafusion/sql/src/values.rs: ## @@ -41,6 +41,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .collect::>>() })

Re: [I] Add ANSI support for Divide and IntegralDivide [datafusion-comet]

2024-08-23 Thread via GitHub
andygrove commented on issue #533: URL: https://github.com/apache/datafusion-comet/issues/533#issuecomment-2307761861 Thanks @hycsam -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Bug detecting datatype in VALUES tuples [datafusion]

2024-08-23 Thread via GitHub
findepi commented on issue #12103: URL: https://github.com/apache/datafusion/issues/12103#issuecomment-2307760145 > This logic coerces all value tuples to cast to the type of the value in the first row. it would be better to coerce all values to their common super type (order insens

Re: [I] Add ANSI support for Divide and IntegralDivide [datafusion-comet]

2024-08-23 Thread via GitHub
hycsam commented on issue #533: URL: https://github.com/apache/datafusion-comet/issues/533#issuecomment-2307759370 I want to take this one! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Add config flag to convert `Utf8View`/`BinaryView` --> `Utf8` / `Binary` at output [datafusion]

2024-08-23 Thread via GitHub
findepi commented on issue #12119: URL: https://github.com/apache/datafusion/issues/12119#issuecomment-2307754089 > I recommend a config flag that makes it possible to convert `Utf8View`/`BinaryView` --> `Utf8` / `Binary` at the query output and I think this conversion should be done by def

Re: [I] handling overflow for the integer types [datafusion]

2024-08-23 Thread via GitHub
findepi commented on issue #3520: URL: https://github.com/apache/datafusion/issues/3520#issuecomment-2307751780 @kmitchener thanks for creating this issue! btw can we perhaps consider labelling it as a `bug` too? -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Unchecked overflow in integer number addition [datafusion]

2024-08-23 Thread via GitHub
findepi commented on issue #12140: URL: https://github.com/apache/datafusion/issues/12140#issuecomment-2307749817 I think this issue duplicates https://github.com/apache/datafusion/issues/3520. will close. can we tag the other one as a `bug`? -- This is an automated message from the Ap

Re: [I] Unchecked overflow in integer number addition [datafusion]

2024-08-23 Thread via GitHub
findepi closed issue #12140: Unchecked overflow in integer number addition URL: https://github.com/apache/datafusion/issues/12140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Check for overflow in substring with negative start [datafusion]

2024-08-23 Thread via GitHub
findepi commented on code in PR #12141: URL: https://github.com/apache/datafusion/pull/12141#discussion_r1729460747 ## datafusion/functions/src/unicode/substr.rs: ## @@ -144,19 +144,25 @@ where let result = iter .zip(start_array.iter())

Re: [PR] WIP: add documentation on `EXPLAIN PLAN` [datafusion]

2024-08-23 Thread via GitHub
alamb commented on PR #12122: URL: https://github.com/apache/datafusion/pull/12122#issuecomment-2307723945 This is awesome -- thank you @devanbenz I took the liberty of pushing some commits to the branch that updated the first example and some wording -- let me know what you think. M

Re: [I] Panic in `substring()` scalar function (SQLancer) [datafusion]

2024-08-23 Thread via GitHub
findepi commented on issue #12129: URL: https://github.com/apache/datafusion/issues/12129#issuecomment-2307710118 There are two bugs actually. 1. `9223372036854775807 + 1` overflows https://github.com/apache/datafusion/issues/12140 2. `select substring('foo', -9223372036854775808,

[I] Unchecked overflow in integer number addition [datafusion]

2024-08-23 Thread via GitHub
findepi opened a new issue, #12140: URL: https://github.com/apache/datafusion/issues/12140 ### Describe the bug ``` > SELECT 9223372036854775807 + 1; +---+ | Int64(9223372036854775807) + Int64(1) | +---

[PR] Add maintenance status note [datafusion-ballista]

2024-08-23 Thread via GitHub
andygrove opened a new pull request, #1043: URL: https://github.com/apache/datafusion-ballista/pull/1043 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing cha

Re: [PR] build(deps): upgrade actions/{upload,download}-artifact@v3 to v4 [datafusion-python]

2024-08-23 Thread via GitHub
andygrove merged PR #829: URL: https://github.com/apache/datafusion-python/pull/829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Cannot create a `SessionContext` with only a `SessionConfig` [datafusion-python]

2024-08-23 Thread via GitHub
andygrove closed issue #826: Cannot create a `SessionContext` with only a `SessionConfig` URL: https://github.com/apache/datafusion-python/issues/826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Fix SessionContext init with only SessionConfig [datafusion-python]

2024-08-23 Thread via GitHub
andygrove merged PR #827: URL: https://github.com/apache/datafusion-python/pull/827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Add example for configuring SessionContext [datafusion]

2024-08-23 Thread via GitHub
Omega359 commented on PR #12139: URL: https://github.com/apache/datafusion/pull/12139#issuecomment-2307687578 lgtm, thanks for adding this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Add example for configuring SessionContext [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12139: URL: https://github.com/apache/datafusion/pull/12139#discussion_r1729397480 ## datafusion/core/src/execution/mod.rs: ## @@ -19,6 +19,8 @@ pub mod context; pub mod session_state; +pub use session_state::{SessionState, SessionStateBuilder}

Re: [PR] Add example for configuring SessionContext [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12139: URL: https://github.com/apache/datafusion/pull/12139#discussion_r1729397743 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1427,6 +1461,12 @@ impl From<&SessionContext> for TaskContext { } } +impl From for SessionContext {

[PR] Add example for configuring SessionContext [datafusion]

2024-08-23 Thread via GitHub
alamb opened a new pull request, #12139: URL: https://github.com/apache/datafusion/pull/12139 ## Which issue does this PR close? N/A ## Rationale for this change While working on the reprpoducer for https://github.com/apache/datafusion/issues/12136 I knew there was a nic

Re: [PR] Add union_extract scalar function [datafusion]

2024-08-23 Thread via GitHub
gstvg commented on code in PR #12116: URL: https://github.com/apache/datafusion/pull/12116#discussion_r1729378926 ## datafusion/functions/src/core/union_extract.rs: ## @@ -0,0 +1,722 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

[PR] feat: support upper and lower for stringview [datafusion]

2024-08-23 Thread via GitHub
tshauck opened a new pull request, #12138: URL: https://github.com/apache/datafusion/pull/12138 ## Which issue does this PR close? Closes #11855 ## Rationale for this change The Utf8View type are currently cast into Utf8 types. This PR updates the `lower` (and `upper`) f

Re: [I] Support substrait serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` [datafusion]

2024-08-23 Thread via GitHub
wiedld commented on issue #12118: URL: https://github.com/apache/datafusion/issues/12118#issuecomment-2307608225 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Builder style API for creating `RuntimeEnv` [datafusion]

2024-08-23 Thread via GitHub
alamb commented on issue #12137: URL: https://github.com/apache/datafusion/issues/12137#issuecomment-2307606449 I think this would be fairly straightforward to add and learn how to use the DataFusion APIs -- This is an automated message from the Apache Git Service. To respond to the messa

[I] Builder style API for creating `RuntimeEnv` [datafusion]

2024-08-23 Thread via GitHub
alamb opened a new issue, #12137: URL: https://github.com/apache/datafusion/issues/12137 ### Is your feature request related to a problem or challenge? While making a reproducer https://github.com/apache/datafusion/issues/12136, I found configuring the SessionContext to use a memory l

Re: [I] Improve the hash join performance by replacing the RawTable to a simple Vec for JoinHashMap [datafusion]

2024-08-23 Thread via GitHub
Dandandan commented on issue #6910: URL: https://github.com/apache/datafusion/issues/6910#issuecomment-2307592973 Current results ``` Benchmark tpch_mem_sf1.json ┏━━┳━━┳━━━┳━━━┓ ┃ Query┃ ma

[I] External sorting not working for string columns [datafusion]

2024-08-23 Thread via GitHub
alamb opened a new issue, #12136: URL: https://github.com/apache/datafusion/issues/12136 ### Describe the bug Filing a ticket based on a conversation in discord: https://discord.com/channels/885562378132000778/1166447479609376850/127572864932959 Basically, I expect that whe

[PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-08-23 Thread via GitHub
itsjunetime opened a new pull request, #12135: URL: https://github.com/apache/datafusion/pull/12135 ## Which issue does this PR close? I think this should close #4028 ## Rationale for this change As far as I can tell, this follows the recommendations stated in #4028. The

Re: [I] Improve the hash join performance by replacing the RawTable to a simple Vec for JoinHashMap [datafusion]

2024-08-23 Thread via GitHub
Dandandan commented on issue #6910: URL: https://github.com/apache/datafusion/issues/6910#issuecomment-2307581675 I am testing this currently to see how this approach is running. It seems https://github.com/apache/datafusion/pull/6724 and https://github.com/apache/datafusion/pull/667

[I] error: failed to determine package fingerprint for build script [datafusion-comet]

2024-08-23 Thread via GitHub
radhikabajaj123 opened a new issue, #867: URL: https://github.com/apache/datafusion-comet/issues/867 Hello Team, I am getting `error: failed to determine package fingerprint for build script` when I run `make release PROFILES="-Pspark-3.4 -Pscala-2.13"` while following the instructio

Re: [I] Improve the hash join performance by replacing the RawTable to a simple Vec for JoinHashMap [datafusion]

2024-08-23 Thread via GitHub
Dandandan commented on issue #6910: URL: https://github.com/apache/datafusion/issues/6910#issuecomment-2307558452 I think it makes sense to first work on making equality faster, then testing this approach again https://github.com/apache/datafusion/issues/12131 -- This is an automated mess

Re: [PR] fix: Support type coercion for ScalarUDFs [datafusion-comet]

2024-08-23 Thread via GitHub
Kimahriman commented on code in PR #865: URL: https://github.com/apache/datafusion-comet/pull/865#discussion_r1729332805 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2003,10 +2003,17 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

[I] Improve the hash join performance by replacing the RawTable to a simple Vec for JoinHashMap [datafusion]

2024-08-23 Thread via GitHub
yahoNanJing opened a new issue, #6910: URL: https://github.com/apache/datafusion/issues/6910 ### Is your feature request related to a problem or challenge? When testing the TPCH q17 on my PC, based on #6800, it costs around 2.4s. Among them, it costs around 800ms for constructing the

Re: [PR] fix: Support type coercion for ScalarUDFs [datafusion-comet]

2024-08-23 Thread via GitHub
Kimahriman commented on code in PR #865: URL: https://github.com/apache/datafusion-comet/pull/865#discussion_r1729322534 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2003,10 +2003,17 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

[PR] Add changelog for 0.2.0-rc1 [datafusion-comet]

2024-08-23 Thread via GitHub
andygrove opened a new pull request, #866: URL: https://github.com/apache/datafusion-comet/pull/866 ## Which issue does this PR close? N/A ## Rationale for this change Preparing for 0.2.0-rc1 ## What changes are included in this PR? #

Re: [PR] minor: Add comments for `GroupedHashAggregateStream` struct [datafusion]

2024-08-23 Thread via GitHub
comphead commented on code in PR #12127: URL: https://github.com/apache/datafusion/pull/12127#discussion_r1729181100 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -76,35 +76,43 @@ use super::AggregateExec; /// This encapsulates the spilling state struct Spill

Re: [PR] Enable StringView by default by passing CI [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #11862: URL: https://github.com/apache/datafusion/pull/11862#discussion_r1729228100 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -514,10 +521,13 @@ pub fn statistics_from_parquet_meta_calc( statistics.total_byte_size = Precis

Re: [PR] perf: Add benchmarks for Spark Scan + Comet Exec [datafusion-comet]

2024-08-23 Thread via GitHub
andygrove merged PR #863: URL: https://github.com/apache/datafusion-comet/pull/863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Throw `not_impl_error` for `approx_percentile_cont` parameters validation [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12133: URL: https://github.com/apache/datafusion/pull/12133#discussion_r1729223545 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -154,19 +154,20 @@ fn get_scalar_value(expr: &Arc) -> Result { } fn validate_input_percen

Re: [PR] Add union_extract scalar function [datafusion]

2024-08-23 Thread via GitHub
samuelcolvin commented on code in PR #12116: URL: https://github.com/apache/datafusion/pull/12116#discussion_r1729195750 ## datafusion/functions/src/core/union_extract.rs: ## @@ -0,0 +1,722 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] fix: single partition in SortPreservingMergeExec don't take fetch [datafusion]

2024-08-23 Thread via GitHub
alamb merged PR #12109: URL: https://github.com/apache/datafusion/pull/12109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: single partition in SortPreservingMergeExec don't take fetch [datafusion]

2024-08-23 Thread via GitHub
alamb commented on PR #12109: URL: https://github.com/apache/datafusion/pull/12109#issuecomment-2307390075 Thank you @haohuaijin and @Dandandan for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Fix thread panic when "unreachable" SpawnedTask code is reachable. [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12086: URL: https://github.com/apache/datafusion/pull/12086#discussion_r1729206001 ## datafusion/common-runtime/src/common.rs: ## @@ -60,18 +60,52 @@ impl SpawnedTask { } /// Joins the task and unwinds the panic if it happens. -pub

Re: [PR] Fix thread panic when "unreachable" SpawnedTask code is reachable. [datafusion]

2024-08-23 Thread via GitHub
alamb merged PR #12086: URL: https://github.com/apache/datafusion/pull/12086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix thread panic when "unreachable" SpawnedTask code is reachable. [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12086: URL: https://github.com/apache/datafusion/pull/12086#discussion_r1729203623 ## datafusion/common-runtime/src/common.rs: ## @@ -60,18 +60,52 @@ impl SpawnedTask { } /// Joins the task and unwinds the panic if it happens. -pub

Re: [I] Thread panics in SpawnedTask during shutdown. [datafusion]

2024-08-23 Thread via GitHub
alamb closed issue #12089: Thread panics in SpawnedTask during shutdown. URL: https://github.com/apache/datafusion/issues/12089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-23 Thread via GitHub
alamb commented on PR #12097: URL: https://github.com/apache/datafusion/pull/12097#issuecomment-2307374193 BTW I am ok with merging this PR if you would like @jayzhan211 -- it just strikes me as it will make the code more complex and it just "feels" wrong to me. -- This is an automated

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12097: URL: https://github.com/apache/datafusion/pull/12097#discussion_r1729194704 ## datafusion/functions-nested/src/array_has.rs: ## @@ -251,75 +237,176 @@ impl ScalarUDFImpl for ArrayHasAny { } /// Represents the type of comparison for array

Re: [PR] Add ability to return `LogicalPlan` by value from `TableProvider` [datafusion]

2024-08-23 Thread via GitHub
alamb commented on code in PR #12113: URL: https://github.com/apache/datafusion/pull/12113#discussion_r1729185572 ## datafusion/optimizer/src/analyzer/inline_table_scan.rs: ## @@ -56,24 +56,22 @@ fn analyze_internal(plan: LogicalPlan) -> Result> { match plan {

Re: [PR] minor: Add comments for `GroupedHashAggregateStream` struct [datafusion]

2024-08-23 Thread via GitHub
comphead commented on code in PR #12127: URL: https://github.com/apache/datafusion/pull/12127#discussion_r1729181100 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -76,35 +76,43 @@ use super::AggregateExec; /// This encapsulates the spilling state struct Spill

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-23 Thread via GitHub
alamb commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2307346685 > I'd like to highlight the idea of having table providers in datafusion-table-providers and integrating them in datafusion-tui from there. This way the providers are also readily

Re: [I] Document "how to read an explain plan" [datafusion]

2024-08-23 Thread via GitHub
devanbenz commented on issue #12088: URL: https://github.com/apache/datafusion/issues/12088#issuecomment-2307291595 > A logical plan is relatively easy to understand, physical plans are definitely hard to understand, because they include the execution detail for exchange-based parallelism

Re: [PR] Add ability to return `LogicalPlan` by value from `TableProvider` [datafusion]

2024-08-23 Thread via GitHub
askalt commented on code in PR #12113: URL: https://github.com/apache/datafusion/pull/12113#discussion_r1729113076 ## datafusion/optimizer/src/analyzer/inline_table_scan.rs: ## @@ -56,24 +56,22 @@ fn analyze_internal(plan: LogicalPlan) -> Result> { match plan {

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-23 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1729106454 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -367,6 +379,289 @@ impl VecAllocExt for Vec { } } +pub trait EmitToEx

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-23 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1729106454 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -367,6 +379,289 @@ impl VecAllocExt for Vec { } } +pub trait EmitToEx

Re: [PR] build(deps): bump datafusion-sql from 40.0.0 to 41.0.0 [datafusion-python]

2024-08-23 Thread via GitHub
dependabot[bot] closed pull request #819: build(deps): bump datafusion-sql from 40.0.0 to 41.0.0 URL: https://github.com/apache/datafusion-python/pull/819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] build(deps): bump datafusion-optimizer from 40.0.0 to 41.0.0 [datafusion-python]

2024-08-23 Thread via GitHub
dependabot[bot] commented on PR #818: URL: https://github.com/apache/datafusion-python/pull/818#issuecomment-2307243264 Looks like datafusion-optimizer is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message,

  1   2   >