Re: [I] Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
jonathanc-n commented on issue #14913: URL: https://github.com/apache/datafusion/issues/14913#issuecomment-2689505622 @Standing-Man You can comment 'take' and it will automatically assign it to you -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Prepare for 46.0.0 release: Version and Changelog [datafusion]

2025-02-27 Thread via GitHub
alamb commented on PR #14903: URL: https://github.com/apache/datafusion/pull/14903#issuecomment-2689508120 - release status update: https://github.com/apache/datafusion/issues/14123#issuecomment-2689507558 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Release DataFusion `46.0.0` [datafusion]

2025-02-27 Thread via GitHub
alamb commented on issue #14123: URL: https://github.com/apache/datafusion/issues/14123#issuecomment-2689507558 I suggest we move the discussion about what expected Expr / Logical Plans to a separate ticket and leave this one to cover release cooedination Update for the release

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-02-27 Thread via GitHub
xingnailu commented on issue #1450: URL: https://github.com/apache/datafusion-comet/issues/1450#issuecomment-2689580180 > Thanks for the info [@xingnailu](https://github.com/xingnailu). It does look like Comet is working as expected. I wonder if the bottleneck is reading data from OSS. Do

Re: [I] Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
Standing-Man commented on issue #14913: URL: https://github.com/apache/datafusion/issues/14913#issuecomment-2689528404 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Weekly Plan (Andrew Lamb) Feb 24, 2025 [datafusion]

2025-02-27 Thread via GitHub
alamb commented on issue #14850: URL: https://github.com/apache/datafusion/issues/14850#issuecomment-2689530477 DataFusion: Bugs/UX/Performance - [x] https://github.com/apache/datafusion/pull/14873 - [x] https://github.com/apache/datafusion/pull/14821 - [x] https://github.com/apa

[PR] Parse SET NAMES syntax in Postgres [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
mvzink opened a new pull request, #1752: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1752 Postgres also supports a limited version of the `SET NAMES` syntax already supported for MySQL (but doesn't support the `COLLATE` part). This also switches the charset name to an

[I] Datafusion-cli: regression after commit `3d64de4`, all results are not displayed [datafusion]

2025-02-27 Thread via GitHub
qazxcdswe123 opened a new issue, #14926: URL: https://github.com/apache/datafusion/issues/14926 ### Describe the bug Good commit: 3d64de4 Bad commit: 53fc94f22fc1 #14877 Preview: https://github.com/user-attachments/assets/08081abe-d770-45f4-bf5a-9972bada9927"; />

Re: [PR] Count alias [datafusion]

2025-02-27 Thread via GitHub
jayzhan211 commented on PR #14927: URL: https://github.com/apache/datafusion/pull/14927#issuecomment-2689800403 ```rust /// Return `self AS name` alias expression pub fn alias(self, name: impl Into) -> Expr { Expr::Alias(Alias::new(self, None::<&str>, name.into()))

Re: [PR] Count alias [datafusion]

2025-02-27 Thread via GitHub
jayzhan211 commented on code in PR #14927: URL: https://github.com/apache/datafusion/pull/14927#discussion_r1974841086 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2469,9 +2470,53 @@ async fn test_count_wildcard_on_sort() -> Result<()> { .explain(false, false)?

Re: [PR] Count alias [datafusion]

2025-02-27 Thread via GitHub
jayzhan211 commented on code in PR #14927: URL: https://github.com/apache/datafusion/pull/14927#discussion_r1974843114 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2545,9 +2652,24 @@ async fn test_count_wildcard_on_where_exist() -> Result<()> { .collect()

Re: [PR] Parse SET NAMES syntax in Postgres [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
iffyio commented on code in PR #1752: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1752#discussion_r1974838541 ## tests/sqlparser_postgres.rs: ## @@ -5533,3 +5533,11 @@ fn parse_varbit_datatype() { _ => unreachable!(), } } + +#[test] +fn parse_set_

[PR] refactor(properties): Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
Standing-Man opened a new pull request, #14925: URL: https://github.com/apache/datafusion/pull/14925 ## Which issue does this PR close? - Closes #14913. ## Rationale for this change the `propertes.rs` is so long that is hard to navigate and find what is going on.

Re: [PR] refactor(properties): Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
Standing-Man commented on PR #14925: URL: https://github.com/apache/datafusion/pull/14925#issuecomment-2689697038 Hi @alamb, please help me trigger the CI, Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Datafusion-cli: regression after commit `3d64de4`, all results are not displayed [datafusion]

2025-02-27 Thread via GitHub
zhuqi-lucas commented on issue #14926: URL: https://github.com/apache/datafusion/issues/14926#issuecomment-2689710107 Hi @qazxcdswe123 , thanks for reporting this, we will fixed it in https://github.com/apache/datafusion/issues/14920#issuecomment-2689516703 -- This is an automated message

Re: [PR] Fix: New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows [datafusion]

2025-02-27 Thread via GitHub
zhuqi-lucas commented on PR #14921: URL: https://github.com/apache/datafusion/pull/14921#issuecomment-2689721572 Thank you @alamb for review, i also add the testing case now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Datafusion-cli: regression after commit `3d64de4`, all results are not displayed [datafusion]

2025-02-27 Thread via GitHub
zhuqi-lucas commented on issue #14926: URL: https://github.com/apache/datafusion/issues/14926#issuecomment-268974 https://github.com/apache/datafusion/pull/14921 @qazxcdswe123 You can try latest PR, i also cover your testing example! -- This is an automated message from the Apac

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-27 Thread via GitHub
wiedld commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2689593652 Thanks for giving me time to catch up. After digging more into the ordering calculation, and making [a small PR](https://github.com/apache/datafusion/pull/14923) with additional

Re: [PR] Fix: New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows [datafusion]

2025-02-27 Thread via GitHub
2010YOUY01 merged PR #14921: URL: https://github.com/apache/datafusion/pull/14921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Add docs to `update_coalesce_ctx_children`. [datafusion]

2025-02-27 Thread via GitHub
berkaysynnada merged PR #14907: URL: https://github.com/apache/datafusion/pull/14907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-27 Thread via GitHub
blaginin commented on code in PR #14685: URL: https://github.com/apache/datafusion/pull/14685#discussion_r1974936267 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -345,6 +345,32 @@ impl FileScanConfig { /// Set the projection of the files p

Re: [PR] Add additional protobuf tests for plans that read parquet with projections [datafusion]

2025-02-27 Thread via GitHub
mkmik commented on PR #14924: URL: https://github.com/apache/datafusion/pull/14924#issuecomment-2689934192 @alamb wrong Marko :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] chore: commit `Cargo.lock` file to make builds more predictable [datafusion-ballista]

2025-02-27 Thread via GitHub
milenkovicm commented on PR #1190: URL: https://github.com/apache/datafusion-ballista/pull/1190#issuecomment-2688860099 it looks like intermittent failure on docker container creation, I can't really re run it -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-02-27 Thread via GitHub
Omega359 commented on code in PR #14775: URL: https://github.com/apache/datafusion/pull/14775#discussion_r1974186012 ## datafusion/ffi/src/tests/udf_udaf_udwf.rs: ## @@ -25,3 +29,15 @@ pub(crate) extern "C" fn create_ffi_abs_func() -> FFI_ScalarUDF { udf.into() } + +pub

Re: [I] Add support for Python UDFs in distributed queries [datafusion-ballista]

2025-02-27 Thread via GitHub
matthewmturner commented on issue #173: URL: https://github.com/apache/datafusion-ballista/issues/173#issuecomment-2688869264 > we should run that idea in datafusion python discord Sure will raise it there. Im also okay to incubate `PythonFunctionFactory` in dft to start, im d

Re: [I] Consider using gRPC streams + chunking to avoid message size limits [datafusion-ballista]

2025-02-27 Thread via GitHub
milenkovicm commented on issue #932: URL: https://github.com/apache/datafusion-ballista/issues/932#issuecomment-2688867749 I wonder if this is still needed after we propagated grpc max message size and IpcWriter set to 2MB chunk size -- This is an automated message from the Apache Git S

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-27 Thread via GitHub
geoffreyclaude commented on code in PR #14918: URL: https://github.com/apache/datafusion/pull/14918#discussion_r1974271100 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1115,7 +1117,7 @@ impl ListingTable { } }) .boxed() -

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-27 Thread via GitHub
geoffreyclaude commented on code in PR #14918: URL: https://github.com/apache/datafusion/pull/14918#discussion_r1974271100 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1115,7 +1117,7 @@ impl ListingTable { } }) .boxed() -

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-27 Thread via GitHub
geoffreyclaude commented on code in PR #14918: URL: https://github.com/apache/datafusion/pull/14918#discussion_r1974061938 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1098,7 +1098,7 @@ impl ListingTable { ) })) .await?; -let

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-27 Thread via GitHub
geoffreyclaude commented on code in PR #14918: URL: https://github.com/apache/datafusion/pull/14918#discussion_r1974270286 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1098,7 +1098,9 @@ impl ListingTable { ) })) .await?; -let

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-27 Thread via GitHub
mbutrovich commented on PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#issuecomment-2689058499 > Thanks @andygrove does that mean we got a single shuffle file per partition or single file per executor? Just to clarify: when you say shuffle file do you mean the fi

[PR] chore: Change Spark 3.5 version from 3.5.1 to 3.5.2 [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove opened a new pull request, #1457: URL: https://github.com/apache/datafusion-comet/pull/1457 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-27 Thread via GitHub
logan-keede commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2688791206 >Seems like that creates a dependency problem I'm not sure how to untangle. catalog::Session will need datasource::FiltFormatFactory, which needs FileFormat, which needs Fil

Re: [PR] Add docs to `update_coalesce_ctx_children`. [datafusion]

2025-02-27 Thread via GitHub
wiedld commented on code in PR #14907: URL: https://github.com/apache/datafusion/pull/14907#discussion_r1974080657 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -316,25 +369,43 @@ fn replace_with_partial_sort( /// are transformed into /// ```text ///

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-27 Thread via GitHub
AdamGS commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2688588184 Seems like that creates a dependency problem I'm not sure how to untangle. `catalog::Session` will need `datasource::FiltFormatFactory`, which needs `FileFormat`, which needs `Fi

Re: [PR] Substrait support for propagating TableScan.filters to Substrait ReadRel.filter and ReadRel.best_effort_filter [datafusion]

2025-02-27 Thread via GitHub
nseekhao commented on code in PR #14194: URL: https://github.com/apache/datafusion/pull/14194#discussion_r1973968357 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -559,12 +559,31 @@ pub fn from_table_scan( let table_schema = scan.source.schema().to_dfschema_re

Re: [I] Bad performance on wide tables (1000+ columns) [datafusion]

2025-02-27 Thread via GitHub
Omega359 commented on issue #7698: URL: https://github.com/apache/datafusion/issues/7698#issuecomment-2688575470 This is the next one on my hit list as I am seeing what I believe is planning time taking up to 30 seconds per batch ![Image](https://github.com/user-attachments/assets/85

Re: [I] Bad performance on wide tables (1000+ columns) [datafusion]

2025-02-27 Thread via GitHub
Omega359 commented on issue #7698: URL: https://github.com/apache/datafusion/issues/7698#issuecomment-2688592062 Note that I may do any work under #13748 ticket as I expect that is more likely the root cause -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Prepare for 55.0.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
alamb commented on PR #1750: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1750#issuecomment-2688607352 > Can we add a small migration guide to minimize the impact of the `Expr::Value` change ? I suggest: Thank you -- added in db7911b -- This is an automated message

Re: [PR] Prepare for 55.0.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
alamb commented on PR #1750: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1750#issuecomment-2688631516 I also pushed a commit with a note about `ObjectName` - 15f99a6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Prepare for 55.0.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
alamb merged PR #1750: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Prepare for 55.0.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
alamb commented on PR #1750: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1750#issuecomment-2688638554 Pushing this and will now make the release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] chore: Change Spark 3.5 version from 3.5.1 to 3.5.3 [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove commented on code in PR #1455: URL: https://github.com/apache/datafusion-comet/pull/1455#discussion_r1973947440 ## spark/src/test/resources/tpcds-plan-stability/approved-plans-v1_4-spark3_5/q1/explain.txt: ## @@ -1,44 +1,46 @@ == Physical Plan == -* CometColumnarToRow

Re: [PR] Add docs to `update_coalesce_ctx_children`. [datafusion]

2025-02-27 Thread via GitHub
wiedld commented on PR #14907: URL: https://github.com/apache/datafusion/pull/14907#issuecomment-2688656333 @berkaysynnada and @alamb -- I just switched it to be only the docs in this PR. And I'm about to push up a PR which is only the regression test for [this issue](https://github.com

Re: [I] Release sqlparser-rs version `0.55.0` [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
alamb commented on issue #1671: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1671#issuecomment-2688657521 I made a release candidate. Here is the draft email. However, the dist.apache.org site appears to be down so I will try again later ``` Hi, I would lik

Re: [PR] feat: add read array support [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove commented on code in PR #1456: URL: https://github.com/apache/datafusion-comet/pull/1456#discussion_r1974062103 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-27 Thread via GitHub
geoffreyclaude commented on code in PR #14918: URL: https://github.com/apache/datafusion/pull/14918#discussion_r1974061938 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1098,7 +1098,7 @@ impl ListingTable { ) })) .await?; -let

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-27 Thread via GitHub
wiedld commented on PR #14637: URL: https://github.com/apache/datafusion/pull/14637#issuecomment-2688686169 Closing this in favor of smaller PRs, doing in piecemeal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Fix the null handling for to_char function [datafusion]

2025-02-27 Thread via GitHub
alamb merged PR #14908: URL: https://github.com/apache/datafusion/pull/14908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix the null handling for to_char function [datafusion]

2025-02-27 Thread via GitHub
alamb commented on PR #14908: URL: https://github.com/apache/datafusion/pull/14908#issuecomment-2689523394 Thanks @kosiew and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-27 Thread via GitHub
alamb merged PR #14919: URL: https://github.com/apache/datafusion/pull/14919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-27 Thread via GitHub
alamb commented on PR #14919: URL: https://github.com/apache/datafusion/pull/14919#issuecomment-2689525342 FYI @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Fix sequential metadata fetching in ListingTable causing high latency [datafusion]

2025-02-27 Thread via GitHub
alamb commented on PR #14918: URL: https://github.com/apache/datafusion/pull/14918#issuecomment-2689526294 Thanks @geoffreyclaude -- I kicked off the CI for this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Use arrow IPC Stream format for spill files [datafusion]

2025-02-27 Thread via GitHub
davidhewitt commented on PR #14868: URL: https://github.com/apache/datafusion/pull/14868#issuecomment-2688935984 Thanks! `clippy` should be happy now... 🤞 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Fix the null handling for `to_char` function [datafusion]

2025-02-27 Thread via GitHub
alamb closed issue #14884: Fix the null handling for `to_char` function URL: https://github.com/apache/datafusion/issues/14884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] Add additional protobuf tests for [datafusion]

2025-02-27 Thread via GitHub
alamb opened a new pull request, #14924: URL: https://github.com/apache/datafusion/pull/14924 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/14679 - Related to https://github.com/apache/datafusion/pull/14685 ## Rationale for this

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-27 Thread via GitHub
alamb commented on PR #14685: URL: https://github.com/apache/datafusion/pull/14685#issuecomment-2689497827 Hi @blaginin -- I also wrote some additional tests in a different PR: - https://github.com/apache/datafusion/issues/14679 Feel free to bring them into this PR (or we can keep

Re: [PR] Examples: boundary analysis example for `AND/OR` conjunctions [datafusion]

2025-02-27 Thread via GitHub
alamb merged PR #14735: URL: https://github.com/apache/datafusion/pull/14735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Examples: boundary analysis example for `AND/OR` conjunctions [datafusion]

2025-02-27 Thread via GitHub
alamb commented on code in PR #14735: URL: https://github.com/apache/datafusion/pull/14735#discussion_r1974619232 ## docs/source/library-user-guide/query-optimizer.md: ## @@ -388,3 +388,119 @@ In the following example, the `type_coercion` and `simplify_expressions` passes ```

Re: [PR] replace TypeSignature::String with TypeSignature::Coercible [datafusion]

2025-02-27 Thread via GitHub
zjregee commented on code in PR #14917: URL: https://github.com/apache/datafusion/pull/14917#discussion_r1974708220 ## datafusion/functions/src/string/ends_with.rs: ## @@ -103,9 +113,28 @@ impl ScalarUDFImpl for EndsWithFunc { /// Returns true if string ends with suffix. /// e

Re: [PR] Fix: New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows [datafusion]

2025-02-27 Thread via GitHub
zhuqi-lucas commented on PR #14921: URL: https://github.com/apache/datafusion/pull/14921#issuecomment-2689727850 The testing include the following test, which was empty before this PR. ```rust > SELECT * FROM generate_series(1, 5) t1(v1) ORDER BY v1 DESC; ++ | v1 | +---

Re: [I] Enhance `__repr__` and `_repr_html_` with a note for additional rows [datafusion-python]

2025-02-27 Thread via GitHub
Spaarsh commented on issue #1026: URL: https://github.com/apache/datafusion-python/issues/1026#issuecomment-2689730430 I'd like to work on this issue. Adding a few lines of code along the lines of: ``` fn __repr__(&self, py: Python) -> PyDataFusionResult { let df = self.df.as_r

Re: [I] Datafusion-cli: regression after commit `3d64de4`, all results are not displayed [datafusion]

2025-02-27 Thread via GitHub
qazxcdswe123 closed issue #14926: Datafusion-cli: regression after commit `3d64de4`, all results are not displayed URL: https://github.com/apache/datafusion/issues/14926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows [datafusion]

2025-02-27 Thread via GitHub
alamb commented on issue #14920: URL: https://github.com/apache/datafusion/issues/14920#issuecomment-2689516703 - I also added this to https://github.com/apache/datafusion/issues/14123 as I think this is a regression -- This is an automated message from the Apache Git Service. To respond

Re: [I] Optimized spill file format [datafusion]

2025-02-27 Thread via GitHub
alamb commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2689518371 - BTW https://github.com/apache/arrow-rs/pull/7120 is complete so will be able to disable validation for spill files with the next arrow release -- This is an automated message

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-27 Thread via GitHub
wForget commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1974688533 ## native/core/src/execution/planner.rs: ## @@ -922,13 +956,18 @@ impl PhysicalPlanner { Ok(DataType::Decimal128(_p2, _s2)), ) =>

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-27 Thread via GitHub
wForget commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1974690157 ## native/core/src/execution/planner.rs: ## @@ -922,13 +956,18 @@ impl PhysicalPlanner { Ok(DataType::Decimal128(_p2, _s2)), ) =>

Re: [PR] Ignore escaped LIKE wildcards in MySQL [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
iffyio commented on code in PR #1735: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1974849855 ## src/dialect/mod.rs: ## @@ -201,6 +201,33 @@ pub trait Dialect: Debug + Any { false } +/// Determine whether the dialect strips th

Re: [PR] Feat: support array_compact function [datafusion-comet]

2025-02-27 Thread via GitHub
kazuyukitanimura commented on code in PR #1321: URL: https://github.com/apache/datafusion-comet/pull/1321#discussion_r1974489483 ## native/core/src/execution/planner.rs: ## @@ -830,6 +830,25 @@ impl PhysicalPlanner { )); Ok(array_has_any_expr)

Re: [PR] refactor(properties): Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
xudong963 commented on PR #14925: URL: https://github.com/apache/datafusion/pull/14925#issuecomment-2689837726 I triggered the CI, and thanks for the split -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Standardize CREATE TABLE options equals signs [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
iffyio commented on code in PR #1751: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1751#discussion_r1974845227 ## tests/sqlparser_mysql.rs: ## @@ -1047,6 +1047,174 @@ fn parse_create_table_gencol() { mysql_and_generic().verified_stmt("CREATE TABLE t1 (a INT,

Re: [PR] Count alias [datafusion]

2025-02-27 Thread via GitHub
jayzhan211 commented on code in PR #14927: URL: https://github.com/apache/datafusion/pull/14927#discussion_r1974873043 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1393,3 +1393,37 @@ item1 1970-01-01T00:00:03 75 statement ok drop table source_table; + +# test c

Re: [PR] refactor(properties): Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
xudong963 commented on code in PR #14925: URL: https://github.com/apache/datafusion/pull/14925#discussion_r1974876690 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -0,0 +1,1771 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] refactor(properties): Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
xudong963 commented on code in PR #14925: URL: https://github.com/apache/datafusion/pull/14925#discussion_r1974875825 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -0,0 +1,1771 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-02-27 Thread via GitHub
shruti2522 commented on issue #14886: URL: https://github.com/apache/datafusion/issues/14886#issuecomment-2689850245 Hi @zhuqi-lucas , I would like to work on this issue. Can I self-assign it? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] refactor(properties): Split properties.rs into smaller modules [datafusion]

2025-02-27 Thread via GitHub
Standing-Man commented on code in PR #14925: URL: https://github.com/apache/datafusion/pull/14925#discussion_r1974892345 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -0,0 +1,1771 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mor

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-27 Thread via GitHub
berkaysynnada commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1974945098 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Re

[I] datafusion-cli not displaying results [datafusion]

2025-02-27 Thread via GitHub
kosiew opened a new issue, #14929: URL: https://github.com/apache/datafusion/issues/14929 ### Describe the bug `cargo run -p datafusion-cli -- -c "select * from 'datafusion/substrait/tests/testdata/data.parquet'"` Before #14877 ``` +---+-++---+---

Re: [I] datafusion-cli not displaying results [datafusion]

2025-02-27 Thread via GitHub
kosiew closed issue #14929: datafusion-cli not displaying results URL: https://github.com/apache/datafusion/issues/14929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Add tests for Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-27 Thread via GitHub
berkaysynnada commented on code in PR #14919: URL: https://github.com/apache/datafusion/pull/14919#discussion_r1974940700 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -3346,3 +3351,62 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Re

Re: [I] Advanced Interval Analysis [datafusion]

2025-02-27 Thread via GitHub
ozankabak commented on issue #14515: URL: https://github.com/apache/datafusion/issues/14515#issuecomment-2689661864 Looking forward to collaborating! 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows [datafusion]

2025-02-27 Thread via GitHub
2010YOUY01 closed issue #14920: New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows URL: https://github.com/apache/datafusion/issues/14920 -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Count alias [datafusion]

2025-02-27 Thread via GitHub
jayzhan211 commented on code in PR #14927: URL: https://github.com/apache/datafusion/pull/14927#discussion_r1974873043 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1393,3 +1393,37 @@ item1 1970-01-01T00:00:03 75 statement ok drop table source_table; + +# test c

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-27 Thread via GitHub
kazuyukitanimura commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1974953243 ## native/core/src/execution/planner.rs: ## @@ -922,13 +956,18 @@ impl PhysicalPlanner { Ok(DataType::Decimal128(_p2, _s2)),

[I] Regression in supported plans for Spark 3.5.2+ [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove opened a new issue, #1458: URL: https://github.com/apache/datafusion-comet/issues/1458 ### Describe the bug When moving from Spark 3.5.1 to 3.5.2 we see that TPC-DS plans are falling back to Spark much more. I think this needs to be looked into. See https://github.com

Re: [PR] perf: Reduce native shuffle memory overhead by 50% [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove commented on code in PR #1452: URL: https://github.com/apache/datafusion-comet/pull/1452#discussion_r1974350259 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -294,7 +291,6 @@ struct ShuffleRepartitioner { num_output_partitions: usize, runtime:

[PR] Standardize CREATE TABLE options equals signs [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
mvzink opened a new pull request, #1751: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1751 * Make spaces around equals signs canonical: `ENGINE = InnoDB` instead of `ENGINE=InnoDB` * Make equals signs canonical: `AUTO_INCREMENT = 100` instead of `AUTO_INCREMENT 100` * M

Re: [PR] Standardize CREATE TABLE options equals signs [datafusion-sqlparser-rs]

2025-02-27 Thread via GitHub
mvzink commented on PR #1751: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1751#issuecomment-2689180088 Note that a version of #1747 could which standardizes all options a la `SqlOption` could supersede this. -- This is an automated message from the Apache Git Service. To

Re: [I] Add support for Python UDFs in distributed queries [datafusion-ballista]

2025-02-27 Thread via GitHub
matthewmturner commented on issue #173: URL: https://github.com/apache/datafusion-ballista/issues/173#issuecomment-2688843381 Maybe `PythonFunctionFactory` could be added to `datafusion-python`? Before my refactor last summer dft used to support ballista and ive had in mind adding it

Re: [PR] chore: Change Spark 3.5 version from 3.5.1 to 3.5.2 [datafusion-comet]

2025-02-27 Thread via GitHub
kazuyukitanimura commented on PR #1457: URL: https://github.com/apache/datafusion-comet/pull/1457#issuecomment-2689373023 I remember some method signature was changed. We may need a reflection change similar to API change: https://github.com/apache/spark/commit/c3ba8fa69cbb88d5880a203f4

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-27 Thread via GitHub
kazuyukitanimura commented on PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2689386875 Thanks @EmilyMatt > I don't really know why this restriction was placed here, so I feel I can't really provide an opinion on any direction, I only saw the comment an

[I] New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows [datafusion]

2025-02-27 Thread via GitHub
zhuqi-lucas opened a new issue, #14920: URL: https://github.com/apache/datafusion/issues/14920 ### Describe the bug When i continue to test for new way for datafusion-cli exec and print, i found the corner case that only one small batch which lines are less than max_rows will not pri

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove commented on issue #1450: URL: https://github.com/apache/datafusion-comet/issues/1450#issuecomment-2688717700 Thanks for the info @xingnailu. It does look like Comet is working as expected. I wonder if the bottleneck is reading data from OSS. Do you have the ability to run the be

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-27 Thread via GitHub
wiedld commented on PR #14821: URL: https://github.com/apache/datafusion/pull/14821#issuecomment-2688837380 > but I don't agree the part where we filter the expressions being heterogeneously constant across partitions, while retaining the output ordering. > I cannot follow from which

[PR] chore: commit `Cargo.lock` file to make builds more predictable [datafusion-ballista]

2025-02-27 Thread via GitHub
milenkovicm opened a new pull request, #1190: URL: https://github.com/apache/datafusion-ballista/pull/1190 # Which issue does this PR close? Closes #. # Rationale for this change as discussed in - https://github.com/apache/datafusion/issues/14135 - https://github.

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-27 Thread via GitHub
kazuyukitanimura commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1974485179 ## native/core/src/execution/planner.rs: ## @@ -922,13 +956,18 @@ impl PhysicalPlanner { Ok(DataType::Decimal128(_p2, _s2)),

Re: [PR] Feat: support array_compact function [datafusion-comet]

2025-02-27 Thread via GitHub
kazuyukitanimura commented on code in PR #1321: URL: https://github.com/apache/datafusion-comet/pull/1321#discussion_r1974489483 ## native/core/src/execution/planner.rs: ## @@ -830,6 +830,25 @@ impl PhysicalPlanner { )); Ok(array_has_any_expr)

Re: [I] Add support for Python UDFs in distributed queries [datafusion-ballista]

2025-02-27 Thread via GitHub
matthewmturner commented on issue #173: URL: https://github.com/apache/datafusion-ballista/issues/173#issuecomment-2688849305 And to add - i had in mind starting with just making dft a ballista client. Id have to give more thought to what it would look like if it were to be used as part o

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove commented on PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#issuecomment-2689075066 > Thanks @andygrove does that mean we got a single shuffle file per partition or single file per executor? For each `ShuffleMapTask` there will now be a maximum of one s

Re: [PR] fix: Reduce number of shuffle spill files, fix spilled_bytes metric, add some unit tests [datafusion-comet]

2025-02-27 Thread via GitHub
andygrove commented on PR #1440: URL: https://github.com/apache/datafusion-comet/pull/1440#issuecomment-2689076285 > > Thanks @andygrove does that mean we got a single shuffle file per partition or single file per executor? > > Just to clarify: when you say shuffle file do you mean t

  1   2   3   >