Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-25 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969203040 ## datafusion/functions-nested/src/sort.rs: ## @@ -143,6 +168,10 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { return exec_err!("array_sor

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for starts_with [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14812: URL: https://github.com/apache/datafusion/pull/14812#discussion_r1969204488 ## datafusion/functions/src/string/starts_with.rs: ## @@ -23,15 +23,19 @@ use arrow::datatypes::DataType; use datafusion_expr::simplify::{ExprSimplifyResult, S

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-25 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969203040 ## datafusion/functions-nested/src/sort.rs: ## @@ -143,6 +168,10 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { return exec_err!("array_sor

[PR] chore(deps): bump uuid from 1.13.2 to 1.14.0 [datafusion]

2025-02-25 Thread via GitHub
dependabot[bot] opened a new pull request, #14866: URL: https://github.com/apache/datafusion/pull/14866 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.13.2 to 1.14.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.14.0 What'

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-25 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969235204 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2546,8 +2568,10 @@ select array_append(column1, arrow_cast(make_array(1, 11, 111), 'FixedSizeList(3 #

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for starts_with [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14812: URL: https://github.com/apache/datafusion/pull/14812#discussion_r1969207266 ## datafusion/sqllogictest/test_files/parquet.slt: ## @@ -619,13 +619,13 @@ query TT explain select * from foo where starts_with(column1, 'f'); logical_

[PR] Complex reader unification [datafusion-comet]

2025-02-25 Thread via GitHub
parthchandra opened a new pull request, #1443: URL: https://github.com/apache/datafusion-comet/pull/1443 Removes code duplication between the native_datafusion and the native_iceberg_compat implementations. Datafusion parquet_exec is now called thru functions in `parquet_exec.rs.` The

Re: [I] TypeSignature::Coercible for math functions [datafusion]

2025-02-25 Thread via GitHub
alan910127 commented on issue #14763: URL: https://github.com/apache/datafusion/issues/14763#issuecomment-2681242756 I noticed that TypeSignatureClass::Numeric is marked as TODO, and I think math functions could benefit from using the Numeric class. Additionally, since there is no Float cla

Re: [PR] feat: pretty explain [datafusion]

2025-02-25 Thread via GitHub
2010YOUY01 commented on code in PR #14677: URL: https://github.com/apache/datafusion/pull/14677#discussion_r1969310940 ## datafusion/datasource/src/source.rs: ## @@ -43,6 +44,7 @@ pub trait DataSource: Send + Sync { ) -> datafusion_common::Result; fn as_any(&self) -> &

Re: [I] Add a hint about expected extension in error message in register_csv, register_parquet, register_json [datafusion]

2025-02-25 Thread via GitHub
devhprl commented on issue #14144: URL: https://github.com/apache/datafusion/issues/14144#issuecomment-2681308324 thanks guys. will dig into that a bit. in the meantime I am slightly adjusting our case to either fit in to the current validations or shape up the need properly. -- This is

[I] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas opened a new issue, #14867: URL: https://github.com/apache/datafusion/issues/14867 ### Is your feature request related to a problem or challenge? We have supported group by in https://github.com/apache/datafusion/pull/13996 This ticket we will support join for H2

Re: [I] Add H2O.ai Database-like Ops benchmark to dfbench (join support) [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas commented on issue #14867: URL: https://github.com/apache/datafusion/issues/14867#issuecomment-2681277306 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Complex reader unification [datafusion-comet]

2025-02-25 Thread via GitHub
codecov-commenter commented on PR #1443: URL: https://github.com/apache/datafusion-comet/pull/1443#issuecomment-2681344638 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1443?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-25 Thread via GitHub
Blizzara commented on code in PR #14860: URL: https://github.com/apache/datafusion/pull/14860#discussion_r1969502677 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -308,6 +308,17 @@ async fn aggregate_grouping_rollup() -> Result<()> { ).await } +#[t

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
LucaCappelletti94 commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2681526937 I have tried to replace the parsing of keywords with the generic String Ident as proposed, but it leads to much more complex parsing. Primarily, I don't u

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969521946 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -435,6 +436,10 @@ fn get_valid_types( match array_type { DataType::List(_) | Data

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969525703 ## datafusion/functions-nested/src/resize.rs: ## @@ -106,6 +127,7 @@ impl ScalarUDFImpl for ArrayResize { match &arg_types[0] { List(field

Re: [PR] DataFusion 45 blog post [datafusion-site]

2025-02-25 Thread via GitHub
alamb commented on PR #57: URL: https://github.com/apache/datafusion-site/pull/57#issuecomment-2681580675 Thanks again everyone! I'll keep an eye on this and make sure it gets published and post the final link, etc here -- This is an automated message from the Apache Git Service. To respo

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1969513878 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2546,8 +2568,10 @@ select array_append(column1, arrow_cast(make_array(1, 11, 111), 'FixedSizeList(3 #

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-25 Thread via GitHub
findepi commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2681390655 > I think we can fix this with the generated projections (and I think it is what @jonahgao is implemented) @alamb I am not sure what is the "generated projections"? then, w

Re: [I] Extension Types / User Defined Types [datafusion]

2025-02-25 Thread via GitHub
paleolimbot commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2681461522 Just a note to thank you for coordinating all of us on this effort! I am new to the code base and I don't have a great handle on exactly where to start, but I'm particularly

[PR] use arrow IPC Stream format for spill files [datafusion]

2025-02-25 Thread via GitHub
davidhewitt opened a new pull request, #14868: URL: https://github.com/apache/datafusion/pull/14868 ## Which issue does this PR close? - Closes #4658 ## Rationale for this change The IPC Stream format allows for dictionary replacement, unlike the IPC File format. As per

[PR] Minor: Counting elapsed_compute in BoundedWindowAggExec [datafusion]

2025-02-25 Thread via GitHub
2010YOUY01 opened a new pull request, #14869: URL: https://github.com/apache/datafusion/pull/14869 ## Which issue does this PR close? - Closes #. ## Rationale for this change Now window operator did not calculate `elapsed_compute` metric, this change includes

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2681422325 anyone familiar with sqlite test? I update test with `--complete`, ends up the format like what we have in datafusion `0 Null`. But execute it again comes out error

Re: [PR] fix(substrait): Do not add implicit groupBy expressions when building logical plans from Substrait [datafusion]

2025-02-25 Thread via GitHub
Blizzara commented on code in PR #14860: URL: https://github.com/apache/datafusion/pull/14860#discussion_r1969498130 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -63,6 +63,26 @@ use indexmap::IndexSet; /// Default table name for unnamed table pub const UNNAMED_TABLE:

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
LucaCappelletti94 commented on PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#issuecomment-2681597078 I have managed to do it, but I really dislike the approach I used, which peeks whether the parser will next encounter one of the expected keywords and if not, a

Re: [PR] Add `statistics_truncate_length` parquet writer config [datafusion]

2025-02-25 Thread via GitHub
alamb merged PR #14782: URL: https://github.com/apache/datafusion/pull/14782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
comphead commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683472594 > > > I did a quick check in the spark functions PR, I didn't see anything related to fill. > > > > > > hi @Omega359 are you referring to https://spark.apache.org/docs/3.

[PR] Include struct name on FileScanConfig debug impl [datafusion]

2025-02-25 Thread via GitHub
alamb opened a new pull request, #14883: URL: https://github.com/apache/datafusion/pull/14883 ## Which issue does this PR close? Part of - Related to https://github.com/delta-io/delta-rs/pull/3261 - Related to https://github.com/apache/datafusion/issues/14123 ## Ration

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970798347 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres -

[I] Fix the null handling for `to_char` function [datafusion]

2025-02-25 Thread via GitHub
goldmedal opened a new issue, #14884: URL: https://github.com/apache/datafusion/issues/14884 ### Describe the bug Currenlty, if we input a null value to `to_char`, we will get an empty string instead of a null value. ``` > select to_char(NULL, '%Y-%m-%d %H:%M:%S') is null; +-

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970785236 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres

Re: [PR] chore: faster maven mirror [datafusion-comet]

2025-02-25 Thread via GitHub
codecov-commenter commented on PR #1447: URL: https://github.com/apache/datafusion-comet/pull/1447#issuecomment-2683791579 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1447?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] Implement builder style API for ParserOptions [datafusion]

2025-02-25 Thread via GitHub
diegoreis42 opened a new pull request, #14885: URL: https://github.com/apache/datafusion/pull/14885 ## Which issue does this PR close? - Closes #14879 ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
comphead commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683379630 > I did a quick check in the spark functions PR, I didn't see anything related to fill. hi @Omega359 are you referring to https://spark.apache.org/docs/3.5.4/api/java/org/apa

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14881: URL: https://github.com/apache/datafusion/pull/14881#issuecomment-2683409993 Ok, I think this one now works again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] fix duplicated schema name error from count wildcard [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14824: URL: https://github.com/apache/datafusion/pull/14824#issuecomment-2683457657 > > I think the issue is that the runner in https://github.com/Omega359/sqllogictest-rs is based on an older version of the sqllogictests than we use in datafusion. > > I have an id

Re: [PR] Require `Debug` for `DataSource` [datafusion]

2025-02-25 Thread via GitHub
alamb commented on code in PR #14882: URL: https://github.com/apache/datafusion/pull/14882#discussion_r1970647757 ## datafusion/datasource/src/source.rs: ## @@ -35,7 +35,7 @@ use datafusion_physical_expr_common::sort_expr::LexOrdering; /// Common behaviors in Data Sources for

Re: [I] Also migrate "invoke" to "invoke_with_args" [datafusion]

2025-02-25 Thread via GitHub
niebayes closed issue #14724: Also migrate "invoke" to "invoke_with_args" URL: https://github.com/apache/datafusion/issues/14724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683368595 I did a quick check in the spark functions PR, I didn't see anything related to fill. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on PR #14881: URL: https://github.com/apache/datafusion/pull/14881#issuecomment-2683674514 Thanks @alamb, let me try it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 merged PR #14881: URL: https://github.com/apache/datafusion/pull/14881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on code in PR #14653: URL: https://github.com/apache/datafusion/pull/14653#discussion_r1970633135 ## datafusion/core/src/dataframe/mod.rs: ## @@ -183,6 +183,8 @@ pub struct DataFrame { // Box the (large) SessionState to reduce the size of DataFrame on the

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2683448966 > > I did a quick check in the spark functions PR, I didn't see anything related to fill. > > hi @Omega359 are you referring to https://spark.apache.org/docs/3.5.4/api/java/o

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
Lordworms commented on code in PR #14844: URL: https://github.com/apache/datafusion/pull/14844#discussion_r1970592527 ## datafusion-cli/src/helper.rs: ## @@ -326,15 +326,6 @@ mod tests { )?; assert!(matches!(result, ValidationResult::Valid(None))); -

Re: [I] [BUG]: SortMergeJoin filtered LeftAnti fails on TPCH Q21 [datafusion-comet]

2025-02-25 Thread via GitHub
comphead commented on issue #861: URL: https://github.com/apache/datafusion-comet/issues/861#issuecomment-2683386368 Fixed with Datafusion 44 dependency update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] [BUG]: SortMergeJoin filtered LeftAnti fails on TPCH Q21 [datafusion-comet]

2025-02-25 Thread via GitHub
comphead closed issue #861: [BUG]: SortMergeJoin filtered LeftAnti fails on TPCH Q21 URL: https://github.com/apache/datafusion-comet/issues/861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970785236 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14881: URL: https://github.com/apache/datafusion/pull/14881#discussion_r1970785236 ## datafusion/sqllogictest/README.md: ## @@ -243,6 +243,14 @@ export RUST_MIN_STACK=30485760; PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-25 Thread via GitHub
wForget commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1970788193 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,20 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

[I] Code clean for new datafusion-cli streaming logic [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas opened a new issue, #14886: URL: https://github.com/apache/datafusion/issues/14886 ### Is your feature request related to a problem or challenge? This is a follow-up for: https://github.com/apache/datafusion/pull/14877#discussion_r1970328077 And we can do some cod

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas commented on PR #14877: URL: https://github.com/apache/datafusion/pull/14877#issuecomment-2683862205 > Thank you @zhuqi-lucas -- this is really cool! > > I have a suggestion that I think would make the code better, but we could do it as a follow on PR or never in my opinio

[I] Native shuffle double allocates memory [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove opened a new issue, #1448: URL: https://github.com/apache/datafusion-comet/issues/1448 ### Describe the bug As demonstrated in the new unit tests added in https://github.com/apache/datafusion-comet/pull/1440, native shuffle is double allocating memory. ```rust

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas commented on code in PR #14877: URL: https://github.com/apache/datafusion/pull/14877#discussion_r1970885830 ## datafusion-cli/src/print_format.rs: ## @@ -209,6 +211,145 @@ impl PrintFormat { } Ok(()) } + +#[allow(clippy::too_many_arguments)

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-02-25 Thread via GitHub
xudong963 commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2683864272 > FWIW I think [@xudong963](https://github.com/xudong963) said he has experience implementing such code so perhaps he will be able to help / assist with the implementation and

[I] Native shuffle inaccurate estimate of builder memory allocation [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove opened a new issue, #1449: URL: https://github.com/apache/datafusion-comet/issues/1449 ### Describe the bug As demonstrated in unit tests added in https://github.com/apache/datafusion-comet/pull/1440, we are allocating ~100kb for a batch when the actual memory used in less

Re: [PR] Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead [datafusion]

2025-02-25 Thread via GitHub
zhuqi-lucas commented on code in PR #14877: URL: https://github.com/apache/datafusion/pull/14877#discussion_r1970883672 ## datafusion-cli/src/print_format.rs: ## @@ -209,6 +211,145 @@ impl PrintFormat { } Ok(()) } + +#[allow(clippy::too_many_arguments)

Re: [I] Remove the need for registering an ObjectStore for remote files [datafusion-python]

2025-02-25 Thread via GitHub
kylebarron commented on issue #899: URL: https://github.com/apache/datafusion-python/issues/899#issuecomment-2683964010 I'd suggest to wait until the `object_store` 0.12 release (and, then, for datafusion to use that) (because I'm pinned to latest main of `object_store` from pyo3_object-st

Re: [PR] Extending support for INDEX parsing [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio commented on code in PR #1707: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1707#discussion_r1970924952 ## src/parser/mod.rs: ## @@ -7629,24 +7643,34 @@ impl<'a> Parser<'a> { } pub fn parse_index_type(&mut self) -> Result { -if self.par

Re: [I] MySQL datatype discrepancy between DDL and `CAST` [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio closed issue #1589: MySQL datatype discrepancy between DDL and `CAST` URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Parse SIGNED INTEGER type in MySQL CAST [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio merged PR #1739: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1943824052 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,238 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[PR] Implement builder style API for ParserOptions [datafusion]

2025-02-25 Thread via GitHub
kosiew opened a new pull request, #14887: URL: https://github.com/apache/datafusion/pull/14887 ## Which issue does this PR close? - Closes #14879. ## Rationale for this change Currently, adding new fields to ParserOptions requires a downstream API change,

Re: [PR] Parse MySQL ALTER TABLE ALGORITHM option [datafusion-sqlparser-rs]

2025-02-25 Thread via GitHub
iffyio merged PR #1745: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-25 Thread via GitHub
wForget commented on PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#issuecomment-2684022663 Integral divide of `decimal` type has inconsistent behavior. test: ``` test("test integral divide") { withTable("t1", "t2") { if (isSpark34Plus

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-25 Thread via GitHub
wForget commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1971074231 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,20 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [I] Support User-Defined Sorting [datafusion]

2025-02-25 Thread via GitHub
tobixdev commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2684185492 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Support User-Defined Sorting [datafusion]

2025-02-25 Thread via GitHub
tobixdev commented on issue #14828: URL: https://github.com/apache/datafusion/issues/14828#issuecomment-2684185236 > In terms of arrow-rs I am not sure we should add anything there yet -- I think we should start the implementation in DataFusion and then port stuff uptream to arrow-rs when i

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
findepi commented on PR #14844: URL: https://github.com/apache/datafusion/pull/14844#issuecomment-2684193983 > I think you are right that this is the core question: "should `datafusion-cli` be doing any escaping / unescaping itself?" no, the CLI should not do thos > If we want

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-25 Thread via GitHub
shehabgamin commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2684037703 Is there a reason why some `arrow` crates are using version `54.2.0` while others are using `54.1.0`? ``` arrow = { version = "54.2.0", features = [ "prettyprint

Re: [PR] Implement builder style API for ParserOptions [datafusion]

2025-02-25 Thread via GitHub
kosiew commented on code in PR #14887: URL: https://github.com/apache/datafusion/pull/14887#discussion_r1970909759 ## datafusion/sql/src/planner.rs: ## @@ -43,15 +43,30 @@ pub use datafusion_expr::planner::ContextProvider; /// SQL parser options #[derive(Debug, Clone, Copy)]

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-25 Thread via GitHub
berkaysynnada commented on code in PR #14685: URL: https://github.com/apache/datafusion/pull/14685#discussion_r1971022680 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -345,6 +345,32 @@ impl FileScanConfig { /// Set the projection of the files

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-25 Thread via GitHub
berkaysynnada commented on code in PR #14685: URL: https://github.com/apache/datafusion/pull/14685#discussion_r1971022680 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -345,6 +345,32 @@ impl FileScanConfig { /// Set the projection of the files

Re: [PR] correctly treat backslash in datafusion-cli [datafusion]

2025-02-25 Thread via GitHub
findepi commented on code in PR #14844: URL: https://github.com/apache/datafusion/pull/14844#discussion_r1971099360 ## datafusion-cli/tests/cli_integration.rs: ## @@ -39,6 +39,10 @@ fn init() { ["--command", "select 1; select 2;", "--format", "json", "-q"], "[{\"Int64(

[PR] Preserve the name of grouping sets in SimplifyExpressions [datafusion]

2025-02-25 Thread via GitHub
joroKr21 opened a new pull request, #14888: URL: https://github.com/apache/datafusion/pull/14888 Whenever we use `recompute_schema` or `with_exprs_and_inputs`, this ensures that we obtain the same schema. ## Which issue does this PR close? Followup to #14734 ## R

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible for trim functions [datafusion]

2025-02-25 Thread via GitHub
jayzhan211 commented on code in PR #14865: URL: https://github.com/apache/datafusion/pull/14865#discussion_r1971048076 ## datafusion/functions/src/string/btrim.rs: ## @@ -19,20 +19,28 @@ use crate::string::common::*; use crate::utils::{make_scalar_function, utf8_to_str_type};

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-25 Thread via GitHub
2010YOUY01 commented on code in PR #14823: URL: https://github.com/apache/datafusion/pull/14823#discussion_r1971059228 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1425,7 +1478,7 @@ mod tests { // Processing 840 KB of data using 400 KB of memory requires at lea

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2684139203 Hey all, sorry for the noise on this thread lately, I've been working to try and understand how to apply this example to an actual DataFusion-based server, and not having much luc

Re: [PR] Fix: External sort failing on `StringView` due to shared buffers [datafusion]

2025-02-25 Thread via GitHub
2010YOUY01 commented on PR #14823: URL: https://github.com/apache/datafusion/pull/14823#issuecomment-2684140744 Thank you all for the feedbacks. I have addressed the review comments (also added a small further simplification for the refactor) -- This is an automated message from the Apach

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1970921927 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-25 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1970921927 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-25 Thread via GitHub
berkaysynnada commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1971059284 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -98,66 +98,91 @@ fn pushdown_sorts_helper( .ordering_satisfy_requiremen

Re: [PR] feat: `tree` / pretty explain [datafusion]

2025-02-25 Thread via GitHub
irenjj commented on code in PR #14677: URL: https://github.com/apache/datafusion/pull/14677#discussion_r1970734646 ## datafusion/physical-plan/src/memory.rs: ## @@ -192,6 +192,7 @@ impl DisplayAs for LazyMemoryExec { .join(", ") )

Re: [I] Run / test Datafusion with JSON Bench from ClickHouse [datafusion]

2025-02-25 Thread via GitHub
goldmedal commented on issue #14874: URL: https://github.com/apache/datafusion/issues/14874#issuecomment-2683612589 I think you mean @douenergy (Alex) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] refactor: make SqlToRel::new derive the parser options from the context provider [datafusion]

2025-02-25 Thread via GitHub
niebayes commented on PR #14822: URL: https://github.com/apache/datafusion/pull/14822#issuecomment-2683625364 @alamb Sure, I'll add some tests soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Migrate Datetime functions to `invoke_with_args` [datafusion]

2025-02-25 Thread via GitHub
goldmedal closed issue #14705: Migrate Datetime functions to `invoke_with_args` URL: https://github.com/apache/datafusion/issues/14705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] chore: migrate to `invoke_with_args` for datetime functions [datafusion]

2025-02-25 Thread via GitHub
goldmedal commented on PR #14876: URL: https://github.com/apache/datafusion/pull/14876#issuecomment-2683639200 Thanks @onlyjackfrost and @alamb for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] chore: migrate to `invoke_with_args` for datetime functions [datafusion]

2025-02-25 Thread via GitHub
goldmedal merged PR #14876: URL: https://github.com/apache/datafusion/pull/14876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Fixed Migrate Datetime functions to invoke_with_args Issue 14705 [datafusion]

2025-02-25 Thread via GitHub
goldmedal closed pull request #14864: Fixed Migrate Datetime functions to invoke_with_args Issue 14705 URL: https://github.com/apache/datafusion/pull/14864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add xxhash algorithms in SQL and expression api [datafusion]

2025-02-25 Thread via GitHub
Omega359 commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2683366217 Note that this seems to also be in the spark function PR - https://github.com/apache/datafusion/pull/14392/files#diff-2bbff2d3a0ce9cec9ed9d6ec6e38ff910875af704b60855f43b47b46c96c5d44

Re: [PR] Use arrow IPC Stream format for spill files [datafusion]

2025-02-25 Thread via GitHub
comphead commented on PR #14868: URL: https://github.com/apache/datafusion/pull/14868#issuecomment-2683393024 Wondering if this PR also addresses partially https://github.com/apache/datafusion/issues/14078 Another thing to mention we still lack some spilling test cases like https://g

Re: [PR] Require `Debug` for `DataSource` [datafusion]

2025-02-25 Thread via GitHub
alamb commented on code in PR #14882: URL: https://github.com/apache/datafusion/pull/14882#discussion_r1970647757 ## datafusion/datasource/src/source.rs: ## @@ -35,7 +35,7 @@ use datafusion_physical_expr_common::sort_expr::LexOrdering; /// Common behaviors in Data Sources for

Re: [PR] Fix `regenerate_sqlite_files.sh` due to changes in sqllogictests [datafusion]

2025-02-25 Thread via GitHub
alamb commented on PR #14881: URL: https://github.com/apache/datafusion/pull/14881#issuecomment-2683458549 @Omega359 has an alternate plan here potentially: https://github.com/apache/datafusion/pull/14824#issuecomment-2683289068 -- This is an automated message from the Apache Git Service

Re: [PR] chore: Re-organize shuffle writer code [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove merged PR #1439: URL: https://github.com/apache/datafusion-comet/pull/1439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Re-organize shuffle writer code [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove commented on PR #1439: URL: https://github.com/apache/datafusion-comet/pull/1439#issuecomment-2683575116 Thanks for the reviews @mbutrovich @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] chore: faster maven mirror [datafusion-comet]

2025-02-25 Thread via GitHub
comphead opened a new pull request, #1447: URL: https://github.com/apache/datafusion-comet/pull/1447 ## Which issue does this PR close? Adding faster maven mirror and enabling some caching Closes #. ## Rationale for this change ## What changes are i

[PR] Update test for datafusion #14824 [datafusion-testing]

2025-02-25 Thread via GitHub
jayzhan211 opened a new pull request, #7: URL: https://github.com/apache/datafusion-testing/pull/7 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[I] Re-implement memory management in native shuffle writer [datafusion-comet]

2025-02-25 Thread via GitHub
andygrove opened a new issue, #1446: URL: https://github.com/apache/datafusion-comet/issues/1446 ### What is the problem the feature request solves? After reviewing the memory management code in the native shuffle writer, there appear to be some bugs and/or inconsitencies, and there a

[I] Consider disabling adding implicit group bys in LogicalPlanBuilder by default [datafusion]

2025-02-25 Thread via GitHub
alamb opened a new issue, #14878: URL: https://github.com/apache/datafusion/issues/14878 I lean towards having the default behaviour be `false` for this, even if it's a breaking change, because it makes the builder less surprising IMO. Specifically, when invoking the b

Re: [I] Consider disabling adding implicit group bys in LogicalPlanBuilder by default [datafusion]

2025-02-25 Thread via GitHub
alamb commented on issue #14878: URL: https://github.com/apache/datafusion/issues/14878#issuecomment-2682898226 Actually i misread the diff, it seems like https://github.com/apache/datafusion/pull/14860 actually changes the default to false, which I think is a good change. -- This is an

  1   2   3   >