Re: [PR] Chunk based iteration in `accumulate_indices` [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13451: URL: https://github.com/apache/datafusion/pull/13451#discussion_r1845965171 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/accumulate.rs: ## @@ -395,19 +395,38 @@ pub fn accumulate_indices( }

[PR] Remove unreachable filter logic in final grouping stage [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 opened a new pull request, #13463: URL: https://github.com/apache/datafusion/pull/13463 ## Which issue does this PR close? Closes #. ## Rationale for this change Aggregate Filter could be applied in partial stage, at the point we reach final stage we d

Re: [PR] Support unparsing Array plan to SQL string [datafusion]

2024-11-17 Thread via GitHub
goldmedal commented on PR #13418: URL: https://github.com/apache/datafusion/pull/13418#issuecomment-2482053239 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845819720 ## python/README.md: ## @@ -29,8 +29,8 @@ part of the default Cargo workspace so that it doesn't cause overhead for mainta Creates a new context and connec

[PR] Support custom field metadata in UDF [datafusion]

2024-11-17 Thread via GitHub
lewiszlw opened a new pull request, #13458: URL: https://github.com/apache/datafusion/pull/13458 ## Which issue does this PR close? Closes #. ## Rationale for this change In our case, we need put type data in schema metadata for supporting logical type. But there

Re: [PR] Chunk based iteration in `accumulate_indices` [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on PR #13451: URL: https://github.com/apache/datafusion/pull/13451#issuecomment-2482009689 > Can we maybe find / create a query with filter? Maybe add one in `clickbench_extended`? aggregate filter required postgres dialect, it might be easier to run benchmark wi

Re: [PR] Remove unreachable filter logic in final grouping stage [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13463: URL: https://github.com/apache/datafusion/pull/13463#discussion_r1845898229 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -859,14 +859,13 @@ impl GroupedHashAggregateStream { )?;

[I] Unparse inner join with no conditions as a cross join [datafusion]

2024-11-17 Thread via GitHub
phillipleblanc opened a new issue, #13459: URL: https://github.com/apache/datafusion/issues/13459 ### Describe the bug As part of the upgrade to DataFusion v43, we found that the CrossJoin logical plan node was removed in DataFusion (https://github.com/apache/datafusion/pull/12985),

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845843592 ## python/ballista/context.py: ## Review Comment: nothing actually, deleting -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Coerce Array inner types [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13452: URL: https://github.com/apache/datafusion/pull/13452#discussion_r1845825232 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Optio

Re: [PR] Coerce Array inner types [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13452: URL: https://github.com/apache/datafusion/pull/13452#discussion_r1845825232 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Optio

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845821674 ## docs/source/user-guide/python.md: ## @@ -28,9 +28,25 @@ popular file formats files, run it in a distributed environment, and obtain the The following

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845821542 ## docs/source/user-guide/python.md: ## @@ -28,9 +28,25 @@ popular file formats files, run it in a distributed environment, and obtain the The following

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845821392 ## docs/source/user-guide/python.md: ## @@ -28,9 +28,25 @@ popular file formats files, run it in a distributed environment, and obtain the The following

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845821115 ## docs/source/user-guide/python.md: ## @@ -103,14 +119,15 @@ The `explain` method can be used to show the logical and physical query plans fo The followin

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845819997 ## docs/source/user-guide/python.md: ## @@ -103,14 +119,15 @@ The `explain` method can be used to show the logical and physical query plans fo The followin

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845819353 ## python/ballista/tests/test_context.py: ## @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. -from pybal

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845811351 ## python/src/lib.rs: ## @@ -15,18 +15,107 @@ // specific language governing permissions and limitations // under the License. +use ballista::prelude::*;

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845810324 ## python/examples/example.py: ## @@ -0,0 +1,33 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845810529 ## python/src/lib.rs: ## @@ -15,18 +15,107 @@ // specific language governing permissions and limitations // under the License. +use ballista::prelude::*;

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845813372 ## python/ballista/__init__.py: ## @@ -25,12 +25,18 @@ import pyarrow as pa -from .pyballista_internal import ( -SessionContext, +from .ballista_int

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845809574 ## python/ballista/__init__.py: ## @@ -25,12 +25,18 @@ import pyarrow as pa -from .pyballista_internal import ( -SessionContext, +from .ballista_int

[PR] Support Utf8View in Unparser `expr_to_sql` [datafusion]

2024-11-17 Thread via GitHub
phillipleblanc opened a new pull request, #13462: URL: https://github.com/apache/datafusion/pull/13462 ## Which issue does this PR close? Closes #13461 ## Rationale for this change Now that Utf8View is being returned by DataFusion, we need to ensure that when we encounte

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845811608 ## python/src/lib.rs: ## @@ -15,18 +15,107 @@ // specific language governing permissions and limitations // under the License. +use ballista::prelude::*;

Re: [PR] Coerce Array inner types [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on PR #13452: URL: https://github.com/apache/datafusion/pull/13452#issuecomment-2481839966 ``` query error DataFusion error: type_coercion\ncaused by\nError during planning: Incompatible inputs for Union: Previous inputs were of type List(.*), but got incompatible ty

Re: [PR] Support unparsing plans after applying `optimize_projections` rule [datafusion]

2024-11-17 Thread via GitHub
goldmedal commented on PR #13267: URL: https://github.com/apache/datafusion/pull/13267#issuecomment-2481832178 > That would impose a limit on future optimizations on logical plans, which isn't desirable. > @findepi I'm not sure about that. Why isn't it desirable? I aim to generate a

Re: [PR] With Order Support for Memory Tables [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
github-actions[bot] closed pull request #1401: With Order Support for Memory Tables URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Replace `OnceLock` with `LazyLock`, update MSRV to 1.80 [datafusion]

2024-11-17 Thread via GitHub
github-actions[bot] commented on PR #11690: URL: https://github.com/apache/datafusion/pull/11690#issuecomment-2481777267 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Fix invalid swap for LeftMark nested loops join [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13426: URL: https://github.com/apache/datafusion/pull/13426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add `GroupColumn` for `Date/Time/Timestamp` [datafusion]

2024-11-17 Thread via GitHub
jonathanc-n commented on PR #13457: URL: https://github.com/apache/datafusion/pull/13457#issuecomment-2481677177 @alamb Was this the sort of implementation you were looking for? (will add tests) -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat: Add `GroupColumn` for `Date/Time/Timestamp` [datafusion]

2024-11-17 Thread via GitHub
jonathanc-n commented on PR #13457: URL: https://github.com/apache/datafusion/pull/13457#issuecomment-2481676686 ``` Comparing main and column Benchmark clickbench_1.json ┏━━┳┳┳━━━┓ ┃ Q

[PR] feat: Add `GroupColumn` for `Date/Time/Timestamp` [datafusion]

2024-11-17 Thread via GitHub
jonathanc-n opened a new pull request, #13457: URL: https://github.com/apache/datafusion/pull/13457 ## Which issue does this PR close? Closes #13263. ## Rationale for this change ## What changes are included in this PR? Add group column for Date/Time/Ti

Re: [PR] fix: Remove dangling table references in `unparser` [datafusion]

2024-11-17 Thread via GitHub
peasee commented on code in PR #13405: URL: https://github.com/apache/datafusion/pull/13405#discussion_r1845688654 ## datafusion/sql/src/unparser/ast.rs: ## @@ -360,6 +455,23 @@ impl RelationBuilder { pub fn has_relation(&self) -> bool { self.relation.is_some()

Re: [PR] fix: Remove dangling table references in `unparser` [datafusion]

2024-11-17 Thread via GitHub
peasee commented on code in PR #13405: URL: https://github.com/apache/datafusion/pull/13405#discussion_r1845679344 ## datafusion/sql/src/unparser/plan.rs: ## @@ -158,10 +158,12 @@ impl Unparser<'_> { } let mut twj = select_builder.pop_from().unwrap(); -

Re: [PR] fix: Remove dangling table references in `unparser` [datafusion]

2024-11-17 Thread via GitHub
peasee commented on code in PR #13405: URL: https://github.com/apache/datafusion/pull/13405#discussion_r1845682157 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -363,3 +363,138 @@ impl TreeNodeRewriter for TableAliasRewriter<'_> { } } } + +/// Takes an input li

Re: [PR] Coerce Array inner types [datafusion]

2024-11-17 Thread via GitHub
blaginin commented on code in PR #13452: URL: https://github.com/apache/datafusion/pull/13452#discussion_r1845634892 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option

Re: [PR] Fix invalid swap for LeftMark nested loops join [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13426: URL: https://github.com/apache/datafusion/pull/13426#issuecomment-2481196936 Thanks @findepi and @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Refactor signatures for lpad, rpad, left, and right [datafusion]

2024-11-17 Thread via GitHub
jiashenC commented on code in PR #13420: URL: https://github.com/apache/datafusion/pull/13420#discussion_r1845563851 ## datafusion/sqllogictest/test_files/scalar.slt: ## @@ -1864,10 +1864,10 @@ query TT EXPLAIN SELECT letter, letter = LEFT(letter2, 1) FROM simple_string;

Re: [I] [DISCUSSION] 2024 Q4 / 2025 Q1 Roadmap [datafusion]

2024-11-17 Thread via GitHub
alamb commented on issue #13274: URL: https://github.com/apache/datafusion/issues/13274#issuecomment-2481430116 > More to come I filed - https://github.com/apache/datafusion/issues/13456 to try and organize my thoughts here better -- This is an automated message from the

[I] [DISCUSSION] Make it easy and fast to query files on remote files (S3, iceberg, etc) [datafusion]

2024-11-17 Thread via GitHub
alamb opened a new issue, #13456: URL: https://github.com/apache/datafusion/issues/13456 ### Is your feature request related to a problem or challenge? I personally think making it easy to use DataFusion with the "open data lake" stack is very important over the next few months.

Re: [PR] `TypeSignatureClass` for mixed type function signature [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13372: URL: https://github.com/apache/datafusion/pull/13372#discussion_r1845412610 ## datafusion/expr-common/src/signature.rs: ## @@ -138,6 +141,48 @@ pub enum TypeSignature { NullAry, } +impl TypeSignature { +#[inline] +pub fn

[PR] feat: Add support for object storage based shuffle [datafusion-ray]

2024-11-17 Thread via GitHub
andygrove opened a new pull request, #48: URL: https://github.com/apache/datafusion-ray/pull/48 Follows on from https://github.com/apache/datafusion-ray/pull/47 Closes https://github.com/apache/datafusion-ray/issues/46 -- This is an automated message from the Apache Git Service. To

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-17 Thread via GitHub
milenkovicm commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845544416 ## docs/source/user-guide/python.md: ## @@ -28,9 +28,25 @@ popular file formats files, run it in a distributed environment, and obtain the The foll

Re: [I] insert_to_external.slt test has unstable results occassionally [datafusion]

2024-11-17 Thread via GitHub
alamb closed issue #13396: insert_to_external.slt test has unstable results occassionally URL: https://github.com/apache/datafusion/issues/13396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Update root `README.md` and other documentation with latest changes [datafusion-ballista]

2024-11-17 Thread via GitHub
milenkovicm commented on code in PR #1113: URL: https://github.com/apache/datafusion-ballista/pull/1113#discussion_r1845414035 ## README.md: ## @@ -17,53 +17,72 @@ under the License. --> -# Ballista: Distributed SQL Query Engine, built on Apache Arrow +# Ballista: Making

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-11-17 Thread via GitHub
Eason0729 commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2481314373 I just see sqlparser release `0.52.0`. (which was released 6 days ago :smile: ) I will start working on that tomorrow. -- This is an automated message from the Apache

Re: [PR] [MINOR]: fix min max accumulator nan bug [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13432: URL: https://github.com/apache/datafusion/pull/13432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fallback to identifier parsing if expression parsing fails [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
yoavcloud commented on code in PR #1513: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1513#discussion_r1845475524 ## src/ast/mod.rs: ## @@ -695,6 +695,8 @@ pub enum Expr { // https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#

Re: [PR] Minor: Fix broken links for meetups in content library [datafusion]

2024-11-17 Thread via GitHub
jonahgao commented on code in PR #13445: URL: https://github.com/apache/datafusion/pull/13445#discussion_r1845473682 ## docs/source/user-guide/concepts-readings-events.md: ## @@ -131,10 +131,11 @@ This is a list of DataFusion related blog posts, articles, and other resources.

Re: [PR] Update root `README.md` and other documentation with latest changes [datafusion-ballista]

2024-11-17 Thread via GitHub
milenkovicm commented on code in PR #1113: URL: https://github.com/apache/datafusion-ballista/pull/1113#discussion_r1845413688 ## docs/source/user-guide/configs.md: ## @@ -19,46 +19,74 @@ # Configuration -## BallistaContext Configuration Settings +## Ballista Configuration

Re: [PR] Coerce Array inner types [datafusion]

2024-11-17 Thread via GitHub
findepi commented on code in PR #13452: URL: https://github.com/apache/datafusion/pull/13452#discussion_r1845466179 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option

Re: [PR] Add docs on TableProvider::statistics() [datafusion]

2024-11-17 Thread via GitHub
findepi commented on code in PR #13454: URL: https://github.com/apache/datafusion/pull/13454#discussion_r1845463997 ## datafusion/catalog/src/table.rs: ## @@ -247,6 +247,9 @@ pub trait TableProvider: Debug + Sync + Send { } /// Get statistics for this table, if avail

Re: [PR] Fix join on arrays of unhashable types and allow hash join on all types supported at run-time [datafusion]

2024-11-17 Thread via GitHub
findepi commented on PR #13388: URL: https://github.com/apache/datafusion/pull/13388#issuecomment-2481260859 @alamb please take another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Update root `README.md` and other documentation with latest changes [datafusion-ballista]

2024-11-17 Thread via GitHub
milenkovicm commented on code in PR #1113: URL: https://github.com/apache/datafusion-ballista/pull/1113#discussion_r1845415528 ## README.md: ## @@ -17,53 +17,72 @@ under the License. --> -# Ballista: Distributed SQL Query Engine, built on Apache Arrow +# Ballista: Making

Re: [PR] Minor: Fix broken links for meetups in content library [datafusion]

2024-11-17 Thread via GitHub
alamb commented on code in PR #13445: URL: https://github.com/apache/datafusion/pull/13445#discussion_r1845447940 ## docs/source/user-guide/concepts-readings-events.md: ## @@ -137,4 +137,5 @@ This is a list of DataFusion related blog posts, articles, and other resources. - **2

Re: [PR] Chunk based iteration in `accumulate_indices` [datafusion]

2024-11-17 Thread via GitHub
Dandandan commented on PR #13451: URL: https://github.com/apache/datafusion/pull/13451#issuecomment-2481238182 Can we maybe find / create a query with filter? Maybe add one in `clickbench_extended`? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Add docs on TableProvider::statistics() [datafusion]

2024-11-17 Thread via GitHub
alamb commented on code in PR #13454: URL: https://github.com/apache/datafusion/pull/13454#discussion_r1845415426 ## datafusion/catalog/src/table.rs: ## @@ -247,6 +247,9 @@ pub trait TableProvider: Debug + Sync + Send { } /// Get statistics for this table, if availab

Re: [I] Max of NaN returns f64::MIN when GROUP BY is used [datafusion]

2024-11-17 Thread via GitHub
alamb closed issue #13415: Max of NaN returns f64::MIN when GROUP BY is used URL: https://github.com/apache/datafusion/issues/13415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Evaluate cheaper condition first in join selection and physical planner [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13435: URL: https://github.com/apache/datafusion/pull/13435#issuecomment-2481196438 Thanks @findepi and @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] When upgrading to DataFusion 43 some queries fail with panic "LeftMark join type does not support swapping" [datafusion]

2024-11-17 Thread via GitHub
alamb closed issue #13425: When upgrading to DataFusion 43 some queries fail with panic "LeftMark join type does not support swapping" URL: https://github.com/apache/datafusion/issues/13425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Produce informative error on physical schema mismatch [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13434: URL: https://github.com/apache/datafusion/pull/13434#issuecomment-2481196748 🥇 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] chore: remove unnecessary test helpers [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13317: URL: https://github.com/apache/datafusion/pull/13317#issuecomment-2481196618 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] chore: remove unnecessary test helpers [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13317: URL: https://github.com/apache/datafusion/pull/13317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix duckdb & sqlite character_length scalar unparsing [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13428: URL: https://github.com/apache/datafusion/pull/13428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: remove unnecessary test helpers [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13317: URL: https://github.com/apache/datafusion/pull/13317#issuecomment-2481196268 I restarted the vendored code CI check as it seems it was due to a network error -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix duckdb & sqlite character_length scalar unparsing [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13428: URL: https://github.com/apache/datafusion/pull/13428#issuecomment-2481196516 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Evaluate cheaper condition first in join selection and physical planner [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13435: URL: https://github.com/apache/datafusion/pull/13435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] [MINOR]: fix min max accumulator nan bug [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13432: URL: https://github.com/apache/datafusion/pull/13432#issuecomment-2481195222 Thanks @akurmustafa -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Fix test query results even for quick test execution [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13453: URL: https://github.com/apache/datafusion/pull/13453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Chunk based iteration in `accumulate_indices` [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on PR #13451: URL: https://github.com/apache/datafusion/pull/13451#issuecomment-2481192336 Neither Q7, Q10, nor Q24 uses an aggregate filter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [DISCUSSION] Challenge: Make DataFusion the fastest engine in ClickBench with custom file format [datafusion]

2024-11-17 Thread via GitHub
alamb commented on issue #13448: URL: https://github.com/apache/datafusion/issues/13448#issuecomment-2481192092 BTW here is an example of how to create a custom file format in DataFusion: https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/custom_file_format.rs --

Re: [PR] `TypeSignatureClass` for mixed type function signature [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13372: URL: https://github.com/apache/datafusion/pull/13372#discussion_r1845412004 ## datafusion/common/src/types/native.rs: ## @@ -433,4 +433,29 @@ impl NativeType { UInt8 | UInt16 | UInt32 | UInt64 | Int8 | Int16 | Int32 | Int64

Re: [PR] `TypeSignatureClass` for mixed type function signature [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13372: URL: https://github.com/apache/datafusion/pull/13372#discussion_r1845412166 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -1096,23 +1096,27 @@ SELECT date_part('nanosecond', timestamp '2020-09-08T12:00:12.12345678+00:00')

Re: [PR] `TypeSignatureClass` for mixed type function signature [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 commented on code in PR #13372: URL: https://github.com/apache/datafusion/pull/13372#discussion_r1845412004 ## datafusion/common/src/types/native.rs: ## @@ -433,4 +433,29 @@ impl NativeType { UInt8 | UInt16 | UInt32 | UInt64 | Int8 | Int16 | Int32 | Int64

Re: [PR] Improve documentation (and ASCII art) about streaming execution, and thread pools [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13423: URL: https://github.com/apache/datafusion/pull/13423#issuecomment-2481187142 > Why can't we all use spawn_blocking() for all CPU-bounded task, and instead we have to use two runtimes explicitly 🤔 Thank you @2010YOUY01 for the question and @tustvold for

Re: [PR] feat: Add `stringview` support to `encode` and `decode` and `bit_length` [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13332: URL: https://github.com/apache/datafusion/pull/13332#issuecomment-2481173925 Thanks again @jonathanc-n -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Support unparsing Array plan to SQL string [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13418: URL: https://github.com/apache/datafusion/pull/13418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add `stringview` support to `encode` and `decode` and `bit_length` [datafusion]

2024-11-17 Thread via GitHub
alamb merged PR #13332: URL: https://github.com/apache/datafusion/pull/13332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support unparsing Array plan to SQL string [datafusion]

2024-11-17 Thread via GitHub
alamb commented on PR #13418: URL: https://github.com/apache/datafusion/pull/13418#issuecomment-2481174103 Thanks again @goldmedal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Support unparsing the Value Plan of Array (List) to SQL String [datafusion]

2024-11-17 Thread via GitHub
alamb closed issue #11144: Support unparsing the Value Plan of Array (List) to SQL String URL: https://github.com/apache/datafusion/issues/11144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Add support for string view to a few functions [datafusion]

2024-11-17 Thread via GitHub
alamb closed issue #13330: Add support for string view to a few functions URL: https://github.com/apache/datafusion/issues/13330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Cannot run benchmarks in k8s due to excessive spilling & OOM [datafusion-ray]

2024-11-17 Thread via GitHub
andygrove commented on issue #44: URL: https://github.com/apache/datafusion-ray/issues/44#issuecomment-2480748656 I tried running locally rather than in k8s using `ray.init()` to create the cluster. The issue is that we are using too much object store memory. For TPC-H q2 @ 100GB, it consum

[PR] Chunk based filter in `accumulate_indices` [datafusion]

2024-11-17 Thread via GitHub
jayzhan211 opened a new pull request, #13451: URL: https://github.com/apache/datafusion/pull/13451 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] Add `#[recursive]` [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
Eason0729 commented on code in PR #1522: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1522#discussion_r1845336106 ## tests/sqlparser_common.rs: ## @@ -11748,3 +11748,16 @@ fn parse_create_table_select() { ); } } + +#[test] +fn overflow() { +let

Re: [PR] recursive select calls are parsed with bad trailing_commas parameter [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
tomershaniii commented on code in PR #1521: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1521#discussion_r1845314145 ## tests/sqlparser_snowflake.rs: ## @@ -2783,6 +2783,14 @@ fn test_parentheses_overflow() { } #[test] + +fn test_nested_select_with_lateral_fl

Re: [I] Improve performance of ClickBench Q18, Q35, [datafusion]

2024-11-17 Thread via GitHub
Dandandan commented on issue #13449: URL: https://github.com/apache/datafusion/issues/13449#issuecomment-2481068201 Also `date_part` from `extract(minute FROM to_timestamp_seconds("EventTime")` seems to be taking some time I added the following items to query 18 in the issue descript

Re: [I] Improve performance of ClickBench Q18, Q35, [datafusion]

2024-11-17 Thread via GitHub
Dandandan commented on issue #13449: URL: https://github.com/apache/datafusion/issues/13449#issuecomment-2481088752 For the queries it seems also possible (but tricky) if the cardinality is high enough (i.e. copying into aggregation columns doesn't reduce memory usage very much), to first e

Re: [PR] support column type definitions in table aliases [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
lovasoa commented on code in PR #1526: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1526#discussion_r1845353716 ## src/ast/query.rs: ## @@ -1610,6 +1610,40 @@ impl fmt::Display for TableAlias { } } +/// SQL column definition in a table expression alias. +

Re: [PR] Chunk based iteration in `accumulate_indices` [datafusion]

2024-11-17 Thread via GitHub
Dandandan commented on PR #13451: URL: https://github.com/apache/datafusion/pull/13451#issuecomment-2481076175 I wonder if any of the queries in that benchmark uses aggregate filters? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Improve performance of ClickBench Q18, Q35, [datafusion]

2024-11-17 Thread via GitHub
Rachelint commented on issue #13449: URL: https://github.com/apache/datafusion/issues/13449#issuecomment-2481053896 I think > Yes, the regression was tracked here #13188 > > Also #13275 tracks some further improvements in `vectorized_append` which seems to be pretty hot in the

Re: [PR] Add `#[recursive]` [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
Eason0729 commented on code in PR #1522: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1522#discussion_r1845335957 ## sqlparser_bench/benches/sqlparser_bench.rs: ## @@ -42,6 +42,46 @@ fn basic_queries(c: &mut Criterion) { group.bench_function("sqlparser::with

Re: [I] Detect stack overflow and reduce stack usage on debug build [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
Eason0729 closed issue #1465: Detect stack overflow and reduce stack usage on debug build URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add `#[recursive]` [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
Eason0729 commented on code in PR #1522: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1522#discussion_r1845336106 ## tests/sqlparser_common.rs: ## @@ -11748,3 +11748,16 @@ fn parse_create_table_select() { ); } } + +#[test] +fn overflow() { +let

Re: [I] Detect stack overflow and reduce stack usage on debug build [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
Eason0729 commented on issue #1465: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1465#issuecomment-2481050971 I will close this issue because 1. I was unable to figure out what make binary on Windows consume significantly more stack. 2. There is a on going PR(htt

Re: [I] Improve performance of ClickBench Q18, Q35, [datafusion]

2024-11-17 Thread via GitHub
Rachelint commented on issue #13449: URL: https://github.com/apache/datafusion/issues/13449#issuecomment-2481048885 For q18, I found string view lead to some regression? ``` [db@localhost.localdomain] 16:10:45 ~/arrow-datafusion $ sudo perf record -F 99 -g --call-graph dwarf /ho

Re: [PR] Implement `Spanned` to retrieve source locations on AST nodes [datafusion-sqlparser-rs]

2024-11-17 Thread via GitHub
iffyio commented on code in PR #1435: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1435#discussion_r1845329434 ## src/ast/helpers/ignore_field.rs: ## @@ -0,0 +1,69 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Add docs on TableProvider::statistics() [datafusion]

2024-11-17 Thread via GitHub
findepi commented on code in PR #13454: URL: https://github.com/apache/datafusion/pull/13454#discussion_r1845329458 ## datafusion/catalog/src/table.rs: ## @@ -247,6 +247,9 @@ pub trait TableProvider: Debug + Sync + Send { } /// Get statistics for this table, if avail