Re: [I] Filters on `RANDOM()` are applied incorrectly when pushdown_filters is enabled. [datafusion]

2024-11-05 Thread via GitHub
findepi commented on issue #13268: URL: https://github.com/apache/datafusion/issues/13268#issuecomment-2458915097 random() (and other violatile functions) shouldn't be handed over to TableProvider as a filter, because it's unnecessarily complicated to do a correct thing with them. We mig

Re: [PR] Support unparsing plans after applying `optimize_projections` rule [datafusion]

2024-11-05 Thread via GitHub
findepi commented on PR #13267: URL: https://github.com/apache/datafusion/pull/13267#issuecomment-2458911437 Would this be equivalent to disabling `optimize_projections`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Implement Specialized `GroupColumn` for `Date`/`Time`/`Timestamp` types for multi-column `GROUP BY` [datafusion]

2024-11-05 Thread via GitHub
buraksenn commented on issue #13263: URL: https://github.com/apache/datafusion/issues/13263#issuecomment-2458897538 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Implement Specialized `GroupColumn` for `Date`/`Time`/`Timestamp` types for multi-column `GROUP BY` [datafusion]

2024-11-05 Thread via GitHub
buraksenn commented on issue #13263: URL: https://github.com/apache/datafusion/issues/13263#issuecomment-2458897065 take I plan to wrap up nth_value PR today thanks to @jcsherin's reviews, then start on this one if it is okay. Otherwise, I can drop this -- This is an automated mess

Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub
findepi commented on PR #13260: URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2458895362 @goldmedal rebased, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Change `schema_infer_max_rec ` config to use `Option` rather than `usize` [datafusion]

2024-11-05 Thread via GitHub
berkaysynnada merged PR #13250: URL: https://github.com/apache/datafusion/pull/13250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] TOP before ALL/DISTINCT [datafusion-sqlparser-rs]

2024-11-05 Thread via GitHub
yoavcloud commented on code in PR #1495: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1495#discussion_r1830495918 ## src/parser/mod.rs: ## @@ -3534,7 +3534,9 @@ impl<'a> Parser<'a> { pub fn parse_all_or_distinct(&mut self) -> Result, ParserError> {

Re: [PR] Support vectorized append and compare for multi group by [datafusion]

2024-11-05 Thread via GitHub
Rachelint commented on code in PR #12996: URL: https://github.com/apache/datafusion/pull/12996#discussion_r1830485718 ## datafusion/physical-plan/src/aggregates/group_values/group_column.rs: ## @@ -128,6 +157,89 @@ impl GroupColumn } } +fn vectorized_equal_t

Re: [I] Logical Plan Tree Structure [datafusion]

2024-11-05 Thread via GitHub
niebayes commented on issue #13266: URL: https://github.com/apache/datafusion/issues/13266#issuecomment-2458833308 For part of your question, there provides a `DisplayableExeuctionPlan` for print an execution plan in a pretty way. -- This is an automated message from the Apache Git Servic

Re: [PR] Fix the parsing error in MSSQL for multiple statements that include `DECLARE` statements [datafusion-sqlparser-rs]

2024-11-05 Thread via GitHub
iffyio commented on code in PR #1497: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1497#discussion_r1830403524 ## src/parser/mod.rs: ## @@ -5335,7 +5335,7 @@ impl<'a> Parser<'a> { for_query: None, }); -if self.next_toke

Re: [PR] TOP before ALL/DISTINCT [datafusion-sqlparser-rs]

2024-11-05 Thread via GitHub
iffyio commented on code in PR #1495: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1495#discussion_r1830396360 ## src/parser/mod.rs: ## @@ -3534,7 +3534,9 @@ impl<'a> Parser<'a> { pub fn parse_all_or_distinct(&mut self) -> Result, ParserError> { le

[I] Unparse SubqueryAlias with the pushdown TableScan fail [datafusion]

2024-11-05 Thread via GitHub
goldmedal opened a new issue, #13272: URL: https://github.com/apache/datafusion/issues/13272 ### Describe the bug I tried to unparse an optimized plan but encountered a 'field not found' issue. The unoptimized plan unparses correctly, suggesting a bug in the unparsing for `TableScan`

Re: [I] Detect stack overflow and reduce stack usage on debug build [datafusion-sqlparser-rs]

2024-11-05 Thread via GitHub
Eason0729 commented on issue #1465: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1465#issuecomment-2458657712 I am busy doing some school works recently (would probably stay busy until the end of this semester), so if there is anyone interested in this issue, feel free to

Re: [PR] Enable needless_pass_by_value clippy lint [datafusion]

2024-11-05 Thread via GitHub
github-actions[bot] commented on PR #12243: URL: https://github.com/apache/datafusion/pull/12243#issuecomment-2458561808 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Aggregation fuzz testing [datafusion]

2024-11-05 Thread via GitHub
LeslieKid commented on issue #12114: URL: https://github.com/apache/datafusion/issues/12114#issuecomment-2458630099 > Additional types that would be good to cover are: > > 1. `Float32/Float64` > 2. `Date` and `Timestamp` πŸ€”The `Date` type is already supported in #13041 . And I

Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub
goldmedal commented on PR #13260: URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2458624344 > draft - to be rebased after #13259 lands > > still ready to review cc @crepererum @goldmedal #13259 has been merged. πŸ‘ -- This is an automated message from the A

Re: [PR] feat: basic support for executing prepared statements [datafusion]

2024-11-05 Thread via GitHub
jonahgao commented on code in PR #13242: URL: https://github.com/apache/datafusion/pull/13242#discussion_r1830322644 ## datafusion/core/src/execution/session_state.rs: ## @@ -906,6 +910,29 @@ impl SessionState { let udtf = self.table_functions.remove(name); Ok(

Re: [PR] Fix incorrect `... LIKE '%'` simplification with NULLs [datafusion]

2024-11-05 Thread via GitHub
goldmedal merged PR #13259: URL: https://github.com/apache/datafusion/pull/13259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Fix incorrect `... LIKE '%'` simplification with NULLs [datafusion]

2024-11-05 Thread via GitHub
goldmedal commented on PR #13259: URL: https://github.com/apache/datafusion/pull/13259#issuecomment-2458622201 Thanks, @findepi and @crepererum for reviewing. Because https://github.com/apache/datafusion/pull/13260 depends on this PR, I'll merge this PR and make the follow-up PR can keep

Re: [PR] feat: basic support for executing prepared statements [datafusion]

2024-11-05 Thread via GitHub
jonahgao commented on code in PR #13242: URL: https://github.com/apache/datafusion/pull/13242#discussion_r1830321629 ## datafusion/core/src/execution/context/mod.rs: ## @@ -687,7 +688,31 @@ impl SessionContext { LogicalPlan::Statement(Statement::SetVariable(stmt)) =

Re: [PR] Introduce `INFORMATION_SCHEMA.ROUTINES` table [datafusion]

2024-11-05 Thread via GitHub
goldmedal commented on code in PR #13255: URL: https://github.com/apache/datafusion/pull/13255#discussion_r1830299668 ## datafusion/expr-common/src/signature.rs: ## @@ -243,6 +243,27 @@ impl TypeSignature { _ => false, } } + +/// get all possible t

[I] Inferring the possible types from the TypeSignature [datafusion]

2024-11-05 Thread via GitHub
goldmedal opened a new issue, #13271: URL: https://github.com/apache/datafusion/issues/13271 ### Is your feature request related to a problem or challenge? https://github.com/apache/datafusion/pull/13255 introduced the `information_schema.routines` table. I'm working on another table,

Re: [PR] fix: Guard against stack overflow in parse_table_and_joins [datafusion-sqlparser-rs]

2024-11-05 Thread via GitHub
github-actions[bot] commented on PR #1411: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1411#issuecomment-2458566650 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [PR] Partially support adding Date32 and integer [datafusion]

2024-11-05 Thread via GitHub
github-actions[bot] commented on PR #12352: URL: https://github.com/apache/datafusion/pull/12352#issuecomment-2458561649 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Prototype implementing DataFusion functions / operators using arrow-udf liibrary [datafusion]

2024-11-05 Thread via GitHub
github-actions[bot] commented on PR #11488: URL: https://github.com/apache/datafusion/pull/11488#issuecomment-2458561897 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] ListingTable cannot handle partition evolution [datafusion]

2024-11-05 Thread via GitHub
adriangb commented on issue #13270: URL: https://github.com/apache/datafusion/issues/13270#issuecomment-2458552653 cc @alamb I had promised you this a long time ago but only got around to it now -- This is an automated message from the Apache Git Service. To respond to the message, please

[I] ListingTable cannot handle partition evolution [datafusion]

2024-11-05 Thread via GitHub
adriangb opened a new issue, #13270: URL: https://github.com/apache/datafusion/issues/13270 ### Describe the bug With CSV: ```shell echo "a,b\n1,2" > data1.csv mkdir a=2 echo "b\n3" > a=2/data2.csv datafusion-cli > SELECT * FROM '**/*.csv'; Arrow error: Csv err

Re: [PR] minor: typo in command example for flamegraph docs [datafusion]

2024-11-05 Thread via GitHub
jonahgao merged PR #13269: URL: https://github.com/apache/datafusion/pull/13269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] minor: typo in command example for flamegraph docs [datafusion]

2024-11-05 Thread via GitHub
jonathanc-n opened a new pull request, #13269: URL: https://github.com/apache/datafusion/pull/13269 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Use LogicalType for TypeSignature `Numeric` and `String`, `Coercible` and introduce `NumericAndNumericString` [datafusion]

2024-11-05 Thread via GitHub
jayzhan211 commented on PR #13240: URL: https://github.com/apache/datafusion/pull/13240#issuecomment-2458469470 I rm remove incorrect numeric string case in this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Use LogicalType for TypeSignature `Numeric` and `String`, `Coercible` and introduce `NumericAndNumericString` [datafusion]

2024-11-05 Thread via GitHub
jayzhan211 commented on code in PR #13240: URL: https://github.com/apache/datafusion/pull/13240#discussion_r1830235403 ## datafusion/expr-common/src/signature.rs: ## @@ -123,8 +124,19 @@ pub enum TypeSignature { /// Specifies Signatures for array functions ArraySignatu

[I] Filters on `RANDOM()` are applied incorrectly when pushdown_filters is enabled. [datafusion]

2024-11-05 Thread via GitHub
adamfaulkner-at opened a new issue, #13268: URL: https://github.com/apache/datafusion/issues/13268 ### Describe the bug When running a query like ``` SELECT * FROM table WHERE RANDOM() < 0.1; ``` I get different results depending on the value of `"datafusion.execution.

Re: [PR] Support unparsing plans after applying `optimize_projections` rule [datafusion]

2024-11-05 Thread via GitHub
sgrebnov commented on code in PR #13267: URL: https://github.com/apache/datafusion/pull/13267#discussion_r1830122206 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -882,6 +882,7 @@ fn test_table_scan_pushdown() -> Result<()> { let query_from_table_scan_with_projection

Re: [PR] Support unparsing plans after applying `optimize_projections` rule [datafusion]

2024-11-05 Thread via GitHub
sgrebnov commented on code in PR #13267: URL: https://github.com/apache/datafusion/pull/13267#discussion_r1830122206 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -882,6 +882,7 @@ fn test_table_scan_pushdown() -> Result<()> { let query_from_table_scan_with_projection

Re: [PR] Support unparsing plans after applying `optimize_projections` rule [datafusion]

2024-11-05 Thread via GitHub
sgrebnov commented on code in PR #13267: URL: https://github.com/apache/datafusion/pull/13267#discussion_r1830100022 ## datafusion/sql/src/unparser/plan.rs: ## @@ -725,24 +727,29 @@ impl Unparser<'_> { } } -if let Some(proj

Re: [PR] Support unparsing plans after applying `optimize_projections` rule [datafusion]

2024-11-05 Thread via GitHub
sgrebnov commented on code in PR #13267: URL: https://github.com/apache/datafusion/pull/13267#discussion_r1830095891 ## datafusion/common/src/config.rs: ## @@ -636,6 +636,10 @@ config_namespace! { /// then the output will be coerced to a non-view. /// Coerces `

[PR] Support unparsing plans after applying `optimize_projections` rule [datafusion]

2024-11-05 Thread via GitHub
sgrebnov opened a new pull request, #13267: URL: https://github.com/apache/datafusion/pull/13267 ## Which issue does this PR close? The `optimize_projections` optimization is very useful when used alongside unparsing logic, as it pushes down projections to the `TableScan` and ensures

Re: [I] Spark ColumnarToRowExec cannot pass CometBuffer safety check [datafusion-comet]

2024-11-05 Thread via GitHub
viirya commented on issue #1059: URL: https://github.com/apache/datafusion-comet/issues/1059#issuecomment-2458258890 The Spark fix: https://github.com/apache/spark/pull/48767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[I] Logical Plan Tree Structure [datafusion]

2024-11-05 Thread via GitHub
whatever1345 opened a new issue, #13266: URL: https://github.com/apache/datafusion/issues/13266 ### Is your feature request related to a problem or challenge? Hello guys, I'm interesting in the analysis feature of DataFusion, specifically with the Logical Plan. But it is very frustra

Re: [PR] feat: add RightMark Join [datafusion]

2024-11-05 Thread via GitHub
jonathanc-n commented on PR #13252: URL: https://github.com/apache/datafusion/pull/13252#issuecomment-2458236706 @eejbyfeldt I implemented the swapping, would be nice to see if I did that correctly. I made a change to `adjust_indices_by_join_type` and combined the logic for the righ

Re: [I] Oct 28, 2024: This week in DataFusion [datafusion]

2024-11-05 Thread via GitHub
alamb closed issue #13167: Oct 28, 2024: This week in DataFusion URL: https://github.com/apache/datafusion/issues/13167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Oct 28, 2024: This week in DataFusion [datafusion]

2024-11-05 Thread via GitHub
alamb commented on issue #13167: URL: https://github.com/apache/datafusion/issues/13167#issuecomment-2458190905 Next week: https://github.com/apache/datafusion/issues/13265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[I] Nov 5. 2024: This week in DataFusion [datafusion]

2024-11-05 Thread via GitHub
alamb opened a new issue, #13265: URL: https://github.com/apache/datafusion/issues/13265 ## Introduction This ticket is a weekly summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments

[PR] Ensure schema and data have the same size [datafusion]

2024-11-05 Thread via GitHub
blaginin opened a new pull request, #13264: URL: https://github.com/apache/datafusion/pull/13264 ## Which issue does this PR close? Closes #. ## Rationale for this change Got a panic when forgot to put braces: https://github.com/user-attachments/assets/e66c

Re: [I] Epic: Statistics improvements [datafusion]

2024-11-05 Thread via GitHub
suremarc commented on issue #8227: URL: https://github.com/apache/datafusion/issues/8227#issuecomment-2458134570 > If we have per-partition statistics, merging them will be problematic for NDV. Extrapolation techniques are not likely to work. Ok, well I suppose we can keep the existin

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-05 Thread via GitHub
tbar4 commented on PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#issuecomment-2458128122 > @tbar4 lets take one step at the time, we can take optional feature at the later time. > > I had a quick look at API an i wonder if rather than having objects per type

Re: [PR] added a BallistaContext to ballista to allow for Remote or standalone [datafusion-ballista]

2024-11-05 Thread via GitHub
tbar4 commented on code in PR #1100: URL: https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1829984068 ## docs/source/user-guide/python.md: ## @@ -28,9 +28,15 @@ popular file formats files, run it in a distributed environment, and obtain the The following

Re: [I] Return the "position" of rows in parquet files after performing a query. [datafusion]

2024-11-05 Thread via GitHub
findepi commented on issue #13261: URL: https://github.com/apache/datafusion/issues/13261#issuecomment-2458092019 > use a function like `ROW_NUMBER` to figure out the positions of rows. It would be great if the parquet reader machinery could expose this information directly instead.

Re: [I] Epic: Statistics improvements [datafusion]

2024-11-05 Thread via GitHub
findepi commented on issue #8227: URL: https://github.com/apache/datafusion/issues/8227#issuecomment-2458099453 If we have per-partition statistics, merging them will be problematic for NDV. Extrapolation techniques are not likely to work. -- This is an automated message from the Apac

Re: [PR] chore: Upgrade to DataFusion 43.0.0-rc1 [datafusion-comet]

2024-11-05 Thread via GitHub
andygrove merged PR #1057: URL: https://github.com/apache/datafusion-comet/pull/1057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Simplify `EXPR LIKE 'constant'` to `expr = 'constant'` [datafusion]

2024-11-05 Thread via GitHub
adriangb closed pull request #13061: Simplify `EXPR LIKE 'constant'` to `expr = 'constant'` URL: https://github.com/apache/datafusion/pull/13061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Simplify `EXPR LIKE 'constant'` to `expr = 'constant'` [datafusion]

2024-11-05 Thread via GitHub
adriangb commented on PR #13061: URL: https://github.com/apache/datafusion/pull/13061#issuecomment-2458072742 Closing in favor of #13260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] chore: Add safety check to CometBuffer [datafusion-comet]

2024-11-05 Thread via GitHub
viirya commented on PR #1050: URL: https://github.com/apache/datafusion-comet/pull/1050#issuecomment-2458055990 I'm considering to add a workaround in Comet as waiting for Spark change might take longer time (and for 3.4/3.5 such change might not easily to backport). -- This is an autom

Re: [PR] allow passing in metadata_size_hint on a per-file basis [datafusion]

2024-11-05 Thread via GitHub
adriangb commented on PR #13213: URL: https://github.com/apache/datafusion/pull/13213#issuecomment-2458049552 Amazing! Let me know what else is needed to merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Switch to iterative `DynNode` and `ConcreteTreeNode` processing [datafusion]

2024-11-05 Thread via GitHub
blaginin commented on PR #13177: URL: https://github.com/apache/datafusion/pull/13177#issuecomment-2458043552 Hey @berkaysynnada πŸ‘‹ Yes, Peter is right - I think we should decide on the approach before starting the actual review. I tagged Andrew above, because he one was one to propose recur

Re: [I] Aggregation fuzz testing [datafusion]

2024-11-05 Thread via GitHub
alamb commented on issue #12114: URL: https://github.com/apache/datafusion/issues/12114#issuecomment-2458043304 @LeslieKid added time/interval/ decimal/utf8view in https://github.com/apache/datafusion/pull/13226 Additional types that would be good to cover are: 1. `Float32/Float64

Re: [PR] Implement `Eq`, `PartialEq`, `Hash` for `dyn PhysicalExpr` [datafusion]

2024-11-05 Thread via GitHub
peter-toth commented on PR #13005: URL: https://github.com/apache/datafusion/pull/13005#issuecomment-2457895952 Thanks for review @alamb and @berkaysynnada. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[I] Reorganize the GroupColumn implementations into more coherent modules [datafusion]

2024-11-05 Thread via GitHub
alamb opened a new issue, #13262: URL: https://github.com/apache/datafusion/issues/13262 ### Is your feature request related to a problem or challenge? While reviewing https://github.com/apache/datafusion/pull/12996 I noticed that all the implementations of `GroupColumn` for different

Re: [PR] Support vectorized append and compare for multi group by [datafusion]

2024-11-05 Thread via GitHub
alamb commented on PR #12996: URL: https://github.com/apache/datafusion/pull/12996#issuecomment-2458035232 As my admittedly sparse help for this PR I have filed some additional tickets for follow on work after this PR is merged: - https://github.com/apache/datafusion/issues/13262 - htt

[I] Implement Specialized `GroupColumn` for `Date`/`Time`/`Timestamp` types [datafusion]

2024-11-05 Thread via GitHub
alamb opened a new issue, #13263: URL: https://github.com/apache/datafusion/issues/13263 ### Is your feature request related to a problem or challenge? In https://github.com/apache/datafusion/pull/12269 @jayzhan211 made significant improvements to how group values are stored in multi-

Re: [I] Reorganize the GroupColumn implementations into more coherent modules [datafusion]

2024-11-05 Thread via GitHub
alamb commented on issue #13262: URL: https://github.com/apache/datafusion/issues/13262#issuecomment-2458020190 Once https://github.com/apache/datafusion/pull/12996 is merged, this would be a good first issue I think as it is just code movement and somewhat mechanical It would be a good way

Re: [PR] chore: Add safety check to CometBuffer [datafusion-comet]

2024-11-05 Thread via GitHub
viirya commented on PR #1050: URL: https://github.com/apache/datafusion-comet/pull/1050#issuecomment-2458013858 This cannot pass all tests due to an issue at Spark side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Support vectorized append and compare for multi group by [datafusion]

2024-11-05 Thread via GitHub
alamb commented on code in PR #12996: URL: https://github.com/apache/datafusion/pull/12996#discussion_r1829848355 ## datafusion/physical-plan/src/aggregates/group_values/column.rs: ## @@ -75,55 +148,653 @@ pub struct GroupValuesColumn { random_state: RandomState, } -impl

Re: [I] Potential performance regression for TPCH q18 [datafusion]

2024-11-05 Thread via GitHub
alamb commented on issue #13188: URL: https://github.com/apache/datafusion/issues/13188#issuecomment-2458002776 I finally filed a ticket upstream in arrow trying to explain what I have been thinking about: - https://github.com/apache/arrow-rs/issues/6692 -- This is an automated message

Re: [PR] Support vectorized append and compare for multi group by [datafusion]

2024-11-05 Thread via GitHub
alamb commented on PR #12996: URL: https://github.com/apache/datafusion/pull/12996#issuecomment-2457985415 Performance results: ``` Benchmark clickbench_partitioned.json ┏━━┳┳┳

[I] Spark ColumnarToRowExec cannot pass CometBuffer safety check [datafusion-comet]

2024-11-05 Thread via GitHub
viirya opened a new issue, #1059: URL: https://github.com/apache/datafusion-comet/issues/1059 ### Describe the bug This was found during debugging CI failures of #1050. One example of failed test is `date_add with int scalars` in `CometExpressionSuite`. The query is `"SELECT _2

Re: [PR] feat: add RightMark Join [datafusion]

2024-11-05 Thread via GitHub
jonathanc-n commented on code in PR #13252: URL: https://github.com/apache/datafusion/pull/13252#discussion_r1829870288 ## datafusion/proto-common/src/generated/pbjson.rs: ## @@ -3911,6 +3913,7 @@ impl<'de> serde::Deserialize<'de> for JoinType { "RIGHTSEMI"

[I] Return the "position" of rows in parquet files after performing a query. [datafusion]

2024-11-05 Thread via GitHub
adamfaulkner-at opened a new issue, #13261: URL: https://github.com/apache/datafusion/issues/13261 ### Is your feature request related to a problem or challenge? Hello! I'm working on a database, using the delta lake format with datafusion as the query engine. I'd like to implement su

Re: [PR] Example: FFI Table Provider as dynamic module loading [datafusion]

2024-11-05 Thread via GitHub
timsaucer merged PR #13183: URL: https://github.com/apache/datafusion/pull/13183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Expand FFI documentation [datafusion]

2024-11-05 Thread via GitHub
timsaucer closed issue #13175: Expand FFI documentation URL: https://github.com/apache/datafusion/issues/13175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] chore: Upgrade to DataFusion 43.0.0-rc1 [datafusion-comet]

2024-11-05 Thread via GitHub
andygrove commented on PR #1057: URL: https://github.com/apache/datafusion-comet/pull/1057#issuecomment-2457914817 I ran some TPC-H benchmarks and do not see any change to performance with the DF 43 upgrade. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] chore: Upgrade to DataFusion 43.0.0-rc1 [datafusion-comet]

2024-11-05 Thread via GitHub
andygrove commented on PR #1057: URL: https://github.com/apache/datafusion-comet/pull/1057#issuecomment-2457877668 @comphead FYI; it looks like there may be a regression in DF 43 related to sort-merge join with join filter. I am tempted to ignore this test for now and file a follow o

Re: [I] Flaky fuzz tests for filtered outer SortMergeJoin [datafusion]

2024-11-05 Thread via GitHub
comphead commented on issue #12359: URL: https://github.com/apache/datafusion/issues/12359#issuecomment-2457904257 Related https://github.com/apache/datafusion-comet/issues/398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] [Epic] High cardinality aggregation performance wishlist [datafusion]

2024-11-05 Thread via GitHub
alamb commented on issue #11679: URL: https://github.com/apache/datafusion/issues/11679#issuecomment-2457902964 Here is another great improvement: https://github.com/apache/datafusion/pull/12996 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-11-05 Thread via GitHub
alamb commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2457899894 I think dft is pretty close, so I am claiming this is done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Intermittent failures in `fuzz_cases::join_fuzz::test_anti_join_1k_filtered` [datafusion]

2024-11-05 Thread via GitHub
comphead commented on issue #11555: URL: https://github.com/apache/datafusion/issues/11555#issuecomment-2457898675 This can be closed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Intermittent failures in `fuzz_cases::join_fuzz::test_anti_join_1k_filtered` [datafusion]

2024-11-05 Thread via GitHub
comphead closed issue #11555: Intermittent failures in `fuzz_cases::join_fuzz::test_anti_join_1k_filtered` URL: https://github.com/apache/datafusion/issues/11555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-11-05 Thread via GitHub
alamb closed issue #11979: Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations URL: https://github.com/apache/datafusion/issues/11979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Remove `Expr` clones from `SortExpr`s [datafusion]

2024-11-05 Thread via GitHub
peter-toth commented on PR #13258: URL: https://github.com/apache/datafusion/pull/13258#issuecomment-2457893876 Thanks @alamb and @crepererum for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Support sort merge join with a join condition [datafusion-comet]

2024-11-05 Thread via GitHub
comphead commented on issue #398: URL: https://github.com/apache/datafusion-comet/issues/398#issuecomment-2457889346 Thanks @andygrove I'll take if from now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] chore: Upgrade to DataFusion 43.0.0-rc1 [datafusion-comet]

2024-11-05 Thread via GitHub
comphead commented on PR #1057: URL: https://github.com/apache/datafusion-comet/pull/1057#issuecomment-2457888283 > @comphead FYI; it looks like there may be a regression in DF 43 related to sort-merge join with join filter. > > I am tempted to ignore this test for now and file a fol

Re: [I] Support sort merge join with a join condition [datafusion-comet]

2024-11-05 Thread via GitHub
andygrove commented on issue #398: URL: https://github.com/apache/datafusion-comet/issues/398#issuecomment-2457883822 There is an existing test `SortMergeJoin with join filter` that we need to enable as part of closing this issue -- This is an automated message from the Apache Git Servic

Re: [PR] Support vectorized append and compare for multi group by [datafusion]

2024-11-05 Thread via GitHub
alamb commented on PR #12996: URL: https://github.com/apache/datafusion/pull/12996#issuecomment-2457883692 I am giving this a final review now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore: Prepare 43.0.0 release [datafusion]

2024-11-05 Thread via GitHub
alamb merged PR #13254: URL: https://github.com/apache/datafusion/pull/13254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add `Time`/`Interval`/`Decimal`/`Utf8View` in aggregate fuzz testing [datafusion]

2024-11-05 Thread via GitHub
alamb commented on PR #13226: URL: https://github.com/apache/datafusion/pull/13226#issuecomment-2457876228 Thanks again @LeslieKid -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] feat: Add `Time`/`Interval`/`Decimal`/`Utf8View` in aggregate fuzz testing [datafusion]

2024-11-05 Thread via GitHub
alamb merged PR #13226: URL: https://github.com/apache/datafusion/pull/13226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Upgrade to DataFusion 43.0.0-rc1 [datafusion-comet]

2024-11-05 Thread via GitHub
andygrove commented on PR #1057: URL: https://github.com/apache/datafusion-comet/pull/1057#issuecomment-2457872115 Test failure in CI: ``` - SortMergeJoin with join filter *** FAILED *** (1 second, 828 milliseconds) Results do not match for query: Timezone: sun.util.cale

Re: [PR] Implement `Eq`, `PartialEq`, `Hash` for `dyn PhysicalExpr` [datafusion]

2024-11-05 Thread via GitHub
alamb commented on PR #13005: URL: https://github.com/apache/datafusion/pull/13005#issuecomment-2457870289 The 43.0.0 release candidate has been made and we have started voting on it. I merged this branch up from main locally to ensure everything still compiles. It looks good so this is rea

Re: [PR] Implement `Eq`, `PartialEq`, `Hash` for `dyn PhysicalExpr` [datafusion]

2024-11-05 Thread via GitHub
alamb merged PR #13005: URL: https://github.com/apache/datafusion/pull/13005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Deprecate `PhysicalSortRequirement::from_sort_exprs` and `PhysicalSortRequirement::to_sort_exprs` [datafusion]

2024-11-05 Thread via GitHub
alamb commented on code in PR #13222: URL: https://github.com/apache/datafusion/pull/13222#discussion_r1829795153 ## datafusion/core/src/physical_optimizer/enforce_sorting.rs: ## @@ -221,7 +221,7 @@ fn replace_with_partial_sort( // here we're trying to find the common p

Re: [PR] Introduce `INFORMATION_SCHEMA.ROUTINES` table [datafusion]

2024-11-05 Thread via GitHub
alamb commented on code in PR #13255: URL: https://github.com/apache/datafusion/pull/13255#discussion_r1829810165 ## datafusion/expr-common/src/signature.rs: ## @@ -243,6 +243,27 @@ impl TypeSignature { _ => false, } } + +/// get all possible types

Re: [PR] Introduce `INFORMATION_SCHEMA.ROUTINES` table [datafusion]

2024-11-05 Thread via GitHub
alamb commented on PR #13255: URL: https://github.com/apache/datafusion/pull/13255#issuecomment-2457867275 fyi @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: basic support for executing prepared statements [datafusion]

2024-11-05 Thread via GitHub
alamb commented on code in PR #13242: URL: https://github.com/apache/datafusion/pull/13242#discussion_r1829802769 ## datafusion/core/src/execution/context/mod.rs: ## @@ -1088,6 +1113,49 @@ impl SessionContext { } } +fn execute_prepared(&self, execute: Execute

Re: [PR] Deprecate `PhysicalSortRequirement::from_sort_exprs` and `PhysicalSortRequirement::to_sort_exprs` [datafusion]

2024-11-05 Thread via GitHub
alamb commented on code in PR #13222: URL: https://github.com/apache/datafusion/pull/13222#discussion_r1826229077 ## datafusion/core/src/physical_optimizer/enforce_distribution.rs: ## @@ -1492,9 +1491,7 @@ pub(crate) mod tests { if self.expr.is_empty() {

Re: [I] Potential performance regression for TPCH q18 [datafusion]

2024-11-05 Thread via GitHub
alamb commented on issue #13188: URL: https://github.com/apache/datafusion/issues/13188#issuecomment-2457755592 https://github.com/apache/datafusion/issues/11628 might have some ideas. I think @XiangpengHao has also been thinking of some way to do this too Basically one thing you mig

Re: [PR] doc: fix K8s links and doc [datafusion-comet]

2024-11-05 Thread via GitHub
comphead merged PR #1058: URL: https://github.com/apache/datafusion-comet/pull/1058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

[PR] doc: fix K8s links and doc [datafusion-comet]

2024-11-05 Thread via GitHub
comphead opened a new pull request, #1058: URL: https://github.com/apache/datafusion-comet/pull/1058 ## Which issue does this PR close? Closes #. ## Rationale for this change Fix the Spark configuration typo and some links ## What changes are included in th

Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub
findepi commented on code in PR #13260: URL: https://github.com/apache/datafusion/pull/13260#discussion_r1829718515 ## datafusion/optimizer/Cargo.toml: ## @@ -47,6 +47,7 @@ indexmap = { workspace = true } itertools = { workspace = true } log = { workspace = true } paste = "1.

Re: [PR] feat(logical-types): add NativeType and LogicalType [datafusion]

2024-11-05 Thread via GitHub
alamb commented on PR #12853: URL: https://github.com/apache/datafusion/pull/12853#issuecomment-2457721548 Epic work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Example: FFI Table Provider as dynamic module loading [datafusion]

2024-11-05 Thread via GitHub
timsaucer commented on code in PR #13183: URL: https://github.com/apache/datafusion/pull/13183#discussion_r1829714251 ## datafusion-examples/examples/ffi/ffi_module_interface/src/lib.rs: ## @@ -0,0 +1,49 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

  1   2   >