Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917906505 ## tests/sqlparser_bigquery.rs: ## @@ -2244,3 +2244,15 @@ fn test_any_type() { fn test_any_type_dont_break_custom_type() { bigquery_and_gene

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917904556 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917893541 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917882700 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-15 Thread via GitHub
berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2594728046 > Looks like I got hit by some new ColumnStatistics tests on main. Should be fixed now 🤞 > > @berkaysynnada can you expand on the rationale for the V2 stats? I understan

Re: [I] Error when joining dataframes with duplicate column names if dataframes generated from file [datafusion]

2025-01-15 Thread via GitHub
chenkovsky commented on issue #14147: URL: https://github.com/apache/datafusion/issues/14147#issuecomment-2594725070 ``` x1.write_csv("df1.csv", with_header=True) x2.write_csv("df2.csv", with_header=True) ``` -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
MartinSahlen commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917876805 ## src/parser/mod.rs: ## @@ -7341,6 +7341,10 @@ impl<'a> Parser<'a> { pub fn parse_options(&mut self, keyword: Keyword) -> Result, ParserErr

Re: [PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
jonahgao merged PR #14145: URL: https://github.com/apache/datafusion/pull/14145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Use `NullBufferBuilder` instead of `BooleanBufferBuilder` for creating Null masks [datafusion]

2025-01-15 Thread via GitHub
Chen-Yuan-Lai commented on issue #14115: URL: https://github.com/apache/datafusion/issues/14115#issuecomment-2594710273 Thanks @alamb for pointing out the detail. I will take it into consideration. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
lewiszlw commented on code in PR #14145: URL: https://github.com/apache/datafusion/pull/14145#discussion_r1917842401 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1112,6 +1113,14 @@ pub(crate) fn need_produce_result_in_final(join_type: JoinType) -> bool { ) } +

Re: [PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
jonahgao commented on code in PR #14145: URL: https://github.com/apache/datafusion/pull/14145#discussion_r1917839619 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1112,6 +1113,14 @@ pub(crate) fn need_produce_result_in_final(join_type: JoinType) -> bool { ) } +

Re: [PR] Simplify `collect_left_input` function in hash join [datafusion]

2025-01-15 Thread via GitHub
lewiszlw commented on PR #14148: URL: https://github.com/apache/datafusion/pull/14148#issuecomment-2594664241 Emm... we have tests that create and run HashJoinExec directly, not optimized by PhysicalOptimizer. Closing. -- This is an automated message from the Apache Git Service. To re

Re: [PR] Simplify `collect_left_input` function in hash join [datafusion]

2025-01-15 Thread via GitHub
lewiszlw closed pull request #14148: Simplify `collect_left_input` function in hash join URL: https://github.com/apache/datafusion/pull/14148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] adding RowrsReader and writer [datafusion]

2025-01-15 Thread via GitHub
Lordworms commented on PR #14149: URL: https://github.com/apache/datafusion/pull/14149#issuecomment-2594663505 I got two following PR for implement SortPreservingMergeStream in Row format and change the logics in SortExec -- This is an automated message from the Apache Git Service. To res

[PR] adding RowrsReader and writer [datafusion]

2025-01-15 Thread via GitHub
Lordworms opened a new pull request, #14149: URL: https://github.com/apache/datafusion/pull/14149 ## Which issue does this PR close? part of #7053 Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these c

[PR] Simplify `collect_left_input` function in hash join [datafusion]

2025-01-15 Thread via GitHub
lewiszlw opened a new pull request, #14148: URL: https://github.com/apache/datafusion/pull/14148 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

[I] Error when joining dataframes with duplicate column names if dataframes generated from file [datafusion]

2025-01-15 Thread via GitHub
fullstart opened a new issue, #14147: URL: https://github.com/apache/datafusion/issues/14147 ### Describe the bug Encountered an issue joining dataframes with duplicate column names if they generated from file read (I tried csv and parquet). Dataframes produced from python dict do

[I] Can `ctx.register_json` directly register and use a standard `JSON` array? `NdJson` is insufficient for handling general data. [datafusion]

2025-01-15 Thread via GitHub
shencangsheng opened a new issue, #14146: URL: https://github.com/apache/datafusion/issues/14146 ### Is your feature request related to a problem or challenge? I hope to directly use standard JSON, as most developers prefer this data format. ```json [ { "use

[PR] Deduplicate function `get_final_indices_from_shared_bitmap` [datafusion]

2025-01-15 Thread via GitHub
lewiszlw opened a new pull request, #14145: URL: https://github.com/apache/datafusion/pull/14145 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? We have two same functions `get_

Re: [I] Add a hint about expected extension in error message in register_csv, register_parquet, register_json [datafusion]

2025-01-15 Thread via GitHub
cj-zhukov commented on issue #14144: URL: https://github.com/apache/datafusion/issues/14144#issuecomment-2594596672 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Add a hint about expected extension in error message in register_csv, register_parquet, register_json [datafusion]

2025-01-15 Thread via GitHub
cj-zhukov opened a new issue, #14144: URL: https://github.com/apache/datafusion/issues/14144 ### Describe the bug When attempting to register an existing file with different format using the register_csv, register_json, register_parquet, instead of receiving an error, the operation s

Re: [I] parquet RowGroup pruning for `Dictionary(Decimal)` type incorrect [datafusion]

2025-01-15 Thread via GitHub
korowa commented on issue #13821: URL: https://github.com/apache/datafusion/issues/13821#issuecomment-2594590175 @kosiew the root cause of the issue is how arrow writer handles data for `Dictionary(Decimal)`, and I suppose it'll mostly be fixed by https://github.com/apache/arrow-rs/pull/698

Re: [I] SQL/PGQ or even GQL support [datafusion]

2025-01-15 Thread via GitHub
gsvgit commented on issue #13545: URL: https://github.com/apache/datafusion/issues/13545#issuecomment-2594583174 My presentation on the topic for DataFusion community meeting: [SemyonGrigorev_DataFusion_PGQ.pdf](https://github.com/user-attachments/files/18433964/SemyonGrigorev_DataFusion_PGQ

[I] A memory-limited sort query fails [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 opened a new issue, #14143: URL: https://github.com/apache/datafusion/issues/14143 ### Describe the bug Compile and run datafusion-cli ``` cargo run -- --mem-pool-type fair -m 80M -c 'select c1, c1 as c2 from generate_series(1,1000) as t1(c1) order by c2 DESC, c1

Re: [PR] Test: Validate memory limit for sort queries [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 commented on code in PR #14142: URL: https://github.com/apache/datafusion/pull/14142#discussion_r1917754162 ## datafusion/core/tests/memory_limit/memory_limit_validation/utils.rs: ## @@ -0,0 +1,192 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [I] A memory-limited sort query fails [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 commented on issue #14143: URL: https://github.com/apache/datafusion/issues/14143#issuecomment-2594537073 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Test: Validate memory limit for sort queries [datafusion]

2025-01-15 Thread via GitHub
2010YOUY01 opened a new pull request, #14142: URL: https://github.com/apache/datafusion/pull/14142 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/13431 ## Rationale for this change Datafusion supports memory-limited queries: it's im

[I] Error when use `user` field in where caluse [datafusion]

2025-01-15 Thread via GitHub
haohuaijin opened a new issue, #14141: URL: https://github.com/apache/datafusion/issues/14141 ### Describe the bug ``` DataFusion CLI v44.0.0 > create table t(a int, b int, user text) as values (1,2,'test'), (2,3,null); 0 row(s) fetched. Elapsed 0.051 seconds. > sele

Re: [I] create view with multi union use the first union schema as the final view schema [datafusion]

2025-01-15 Thread via GitHub
jonahgao closed issue #14132: create view with multi union use the first union schema as the final view schema URL: https://github.com/apache/datafusion/issues/14132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] bugfix: create view with multi union may get wrong schema [datafusion]

2025-01-15 Thread via GitHub
jonahgao commented on PR #14133: URL: https://github.com/apache/datafusion/pull/14133#issuecomment-2594345585 @Curricane Filed a todo issue #14140 for adding tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[I] Add tests for PR #14133 [datafusion]

2025-01-15 Thread via GitHub
jonahgao opened a new issue, #14140: URL: https://github.com/apache/datafusion/issues/14140 We add some tests for PR #14133 > Hi @Curricane -- thank you for the fix 🙏 > > Can you please add some tests, ideally in sqllogicteset format to ensure this behavior is not broken in th

Re: [PR] bugfix: create view with multi union may get wrong schema [datafusion]

2025-01-15 Thread via GitHub
jonahgao merged PR #14133: URL: https://github.com/apache/datafusion/pull/14133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: add `alias()` method for DataFrame [datafusion]

2025-01-15 Thread via GitHub
jonahgao commented on PR #14127: URL: https://github.com/apache/datafusion/pull/14127#issuecomment-2594318289 Thanks @comphead @alamb for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] flamegraph [datafusion]

2025-01-15 Thread via GitHub
github-actions[bot] commented on PR #13455: URL: https://github.com/apache/datafusion/pull/13455#issuecomment-2594306893 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] bugfix: create view with multi union may get wrong schema [datafusion]

2025-01-15 Thread via GitHub
Curricane commented on PR #14133: URL: https://github.com/apache/datafusion/pull/14133#issuecomment-2594286581 > Hi @Curricane -- thank you for the fix 🙏 > > Can you please add some tests, ideally in sqllogicteset format to ensure this behavior is not broken in the future. > >

Re: [PR] doc-gen: migrate scalar functions (encoding & regex) documentation [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #13919: URL: https://github.com/apache/datafusion/pull/13919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] doc-gen: migrate scalar functions (encoding & regex) documentation [datafusion]

2025-01-15 Thread via GitHub
Chen-Yuan-Lai commented on PR #13919: URL: https://github.com/apache/datafusion/pull/13919#issuecomment-2594262550 @alamb @comphead I think this PR is ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] update release scripts to release to Apache DataFusion not Apache Arrow [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1163: URL: https://github.com/apache/datafusion-ballista/pull/1163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Minor: move resolve_overlap a method on `OrderingEquivalenceClass` [datafusion]

2025-01-15 Thread via GitHub
jayzhan211 merged PR #14138: URL: https://github.com/apache/datafusion/pull/14138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Fix combine with session config [datafusion]

2025-01-15 Thread via GitHub
jayzhan211 merged PR #14139: URL: https://github.com/apache/datafusion/pull/14139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Extension Types [datafusion]

2025-01-15 Thread via GitHub
jayzhan211 commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2594225106 The current status is that we have several changes in branch `logical-types` for #12622. Where the `Scalar` is introduced and the next step is to complete the tasks left in #

Re: [I] Extension Types [datafusion]

2025-01-15 Thread via GitHub
jayzhan-synnada commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2594223771 The current status is that we have several changes in branch `logical-types` for #12622. Where the `Scalar` is introduced and the next step is to complete the tasks left

Re: [PR] chore: replace print statements with logs [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1162: URL: https://github.com/apache/datafusion-ballista/pull/1162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[PR] update release scripts to release to Apache DataFusion not Apache Arrow [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove opened a new pull request, #1163: URL: https://github.com/apache/datafusion-ballista/pull/1163 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing cha

Re: [PR] Return err if wildcard is not expanded before type coercion [datafusion]

2025-01-15 Thread via GitHub
xudong963 merged PR #14130: URL: https://github.com/apache/datafusion/pull/14130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Return err if wildcard is not expanded before type coercion [datafusion]

2025-01-15 Thread via GitHub
xudong963 commented on PR #14130: URL: https://github.com/apache/datafusion/pull/14130#issuecomment-2594170021 thanks @alamb @jonahgao —ill merge it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Comet parquet exec merge from main(20250114) [datafusion-comet]

2025-01-15 Thread via GitHub
andygrove commented on PR #1293: URL: https://github.com/apache/datafusion-comet/pull/1293#issuecomment-2594155855 TPC-H times: comet_native 327.14 comet_datafusion 341.61 comet_iceberg_compat 297.71 :fire: (first sub-300 timing I have seen) Our published time for 0.5.0 i

Re: [I] Improve Aggregate with Limit [datafusion]

2025-01-15 Thread via GitHub
ctsk commented on issue #13729: URL: https://github.com/apache/datafusion/issues/13729#issuecomment-2594124886 I checked out your second point and was able to replicate the difference in speed between the two queries. I don't yet see how a TopK operator would help the non-ordered query.

Re: [PR] [DO NOT MERGE] Merge latest from main into comet-parquet-exec branch [datafusion-comet]

2025-01-15 Thread via GitHub
andygrove closed pull request #1291: [DO NOT MERGE] Merge latest from main into comet-parquet-exec branch URL: https://github.com/apache/datafusion-comet/pull/1291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] Comet parquet exec merge from main(20250114) [datafusion-comet]

2025-01-15 Thread via GitHub
parthchandra opened a new pull request, #1293: URL: https://github.com/apache/datafusion-comet/pull/1293 Brings comet-parquet-exec almost up to date with main There are three new test failures which will be addressed in subsequent PRs - ``` - Broadcast HashJoin without join filter

Re: [PR] doc: add new logo to readme [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1161: URL: https://github.com/apache/datafusion-ballista/pull/1161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix: API build problem due to missing dependency [datafusion-ballista]

2025-01-15 Thread via GitHub
andygrove merged PR #1159: URL: https://github.com/apache/datafusion-ballista/pull/1159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[PR] chore: Add array types to fuzz testing utility [datafusion-comet]

2025-01-15 Thread via GitHub
andygrove opened a new pull request, #1292: URL: https://github.com/apache/datafusion-comet/pull/1292 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[PR] chore: replace print statements from logs [datafusion-ballista]

2025-01-15 Thread via GitHub
milenkovicm opened a new pull request, #1162: URL: https://github.com/apache/datafusion-ballista/pull/1162 # Which issue does this PR close? Closes none. # Rationale for this change in all cases we use logger instead of print, removing print with log # What change

[PR] doc: add new logo to readme [datafusion-ballista]

2025-01-15 Thread via GitHub
milenkovicm opened a new pull request, #1161: URL: https://github.com/apache/datafusion-ballista/pull/1161 # Which issue does this PR close? Closes none. # Rationale for this change adding new logo to readme, to be consistent with comet and datafusion # What chan

[PR] chore: planner cleanup and refactor [datafusion-ballista]

2025-01-15 Thread via GitHub
milenkovicm opened a new pull request, #1160: URL: https://github.com/apache/datafusion-ballista/pull/1160 # Which issue does this PR close? Closes none. # Rationale for this change move `BallistaPlanner` to its own module from utils module. # What changes are inc

[PR] fix: API build problem due to missing dependency [datafusion-ballista]

2025-01-15 Thread via GitHub
milenkovicm opened a new pull request, #1159: URL: https://github.com/apache/datafusion-ballista/pull/1159 # Which issue does this PR close? Closes none. # Rationale for this change graph-viz deps have been removed so it broke rest-api, putting dependancy back #

[PR] ignore: Comet parquet exec merge 20250114 [datafusion-comet]

2025-01-15 Thread via GitHub
andygrove opened a new pull request, #1291: URL: https://github.com/apache/datafusion-comet/pull/1291 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Add support for parsing RAISERROR [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
iffyio commented on code in PR #1656: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1656#discussion_r1917322346 ## src/parser/mod.rs: ## @@ -13150,6 +13151,40 @@ impl<'a> Parser<'a> { } } +pub fn parse_raiserror(&mut self) -> Result { +

[PR] Fix combine with session config [datafusion]

2025-01-15 Thread via GitHub
XiangpengHao opened a new pull request, #14139: URL: https://github.com/apache/datafusion/pull/14139 ## Which issue does this PR close? Closes #. ## Rationale for this change Current `default_from_session_config` take no effect, this pr fix it. It also added a `#[mus

Re: [PR] Minor: move resolve_overlap a method on `OrderingEquivalenceClass` [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14138: URL: https://github.com/apache/datafusion/pull/14138#discussion_r1917264140 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -156,6 +156,26 @@ impl OrderingEquivalenceClass { } } +/// Trims `orderings[idx]`

[PR] Minor: move resolve_overlap a method on `OrderingEquivalenceClass` [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new pull request, #14138: URL: https://github.com/apache/datafusion/pull/14138 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/13748 ## Rationale for this change I am testing out a change to the internal representati

[PR] Update datafusion-testing git hash [datafusion]

2025-01-15 Thread via GitHub
Omega359 opened a new pull request, #14137: URL: https://github.com/apache/datafusion/pull/14137 ## Which issue does this PR close? Closes #. ## Rationale for this change Update datafusion-testing to latest hash to fix test issues with extended github action work

[PR] POC: Use IndexSet rather than `Vec` for OrderingEquivalenceClass [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new pull request, #14136: URL: https://github.com/apache/datafusion/pull/14136 WIP as I am still in the process of - [ ] I need to fix `resolve_overlap` - [ ] Test timing ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issu

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2593638218 > Thanks @alamb for proactively reviewing and adding the fix commits. Much appreciated 🙏 Thank you for helping push this along ❤️ -- This is an automated message from the Apa

Re: [PR] Propagate table constraints through physical plans to optimize sort operations [datafusion]

2025-01-15 Thread via GitHub
gokselk commented on code in PR #14111: URL: https://github.com/apache/datafusion/pull/14111#discussion_r1917134631 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -210,30 +221,25 @@ impl FileScanConfig { self } -/// Project the sch

Re: [I] Spaceship operator (<=>) not supported [datafusion]

2025-01-15 Thread via GitHub
ion-elgreco commented on issue #14098: URL: https://github.com/apache/datafusion/issues/14098#issuecomment-2593609621 @Spaarsh go ahead! :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-15 Thread via GitHub
alamb commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2593564670 I plan to start assembing the release candidate and test on the week of Jan 27 (in about 2 weeks time() -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Spaceship operator (<=>) not supported [datafusion]

2025-01-15 Thread via GitHub
Spaarsh commented on issue #14098: URL: https://github.com/apache/datafusion/issues/14098#issuecomment-2593547003 @ion-elgreco @alamb if it is fine by you, can I work on this? I am just getting familiar with the codebase with a few commits here and there :D -- This is an automated message

Re: [I] Use `NullBufferBuilder` instead of `BooleanBufferBuilder` for creating Null masks [datafusion]

2025-01-15 Thread via GitHub
alamb commented on issue #14115: URL: https://github.com/apache/datafusion/issues/14115#issuecomment-2593524823 I looked a little at this usage: https://github.com/apache/datafusion/blob/63b94c8f9e128b938e81b7e867ce6256a94d67e6/datafusion/physical-plan/src/aggregates/group_values/null_builde

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
mnpw commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2593522924 Thanks @alamb for proactively reviewing and adding the fix commits. Much appreciated 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Allow empty options for BigQuery [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
iffyio commented on code in PR #1657: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1657#discussion_r1917060943 ## tests/sqlparser_bigquery.rs: ## @@ -2244,3 +2244,15 @@ fn test_any_type() { fn test_any_type_dont_break_custom_type() { bigquery_and_generic().

Re: [PR] Feat: Support array_join [datafusion-comet]

2025-01-15 Thread via GitHub
erenavsarogullari commented on code in PR #1290: URL: https://github.com/apache/datafusion-comet/pull/1290#discussion_r1917053881 ## native/proto/src/proto/expr.proto: ## @@ -86,6 +86,7 @@ message Expr { ArrayInsert array_insert = 59; BinaryExpr array_contains = 60;

Re: [PR] Feat: Support array_join [datafusion-comet]

2025-01-15 Thread via GitHub
erenavsarogullari commented on code in PR #1290: URL: https://github.com/apache/datafusion-comet/pull/1290#discussion_r1917053881 ## native/proto/src/proto/expr.proto: ## @@ -86,6 +86,7 @@ message Expr { ArrayInsert array_insert = 59; BinaryExpr array_contains = 60;

Re: [PR] Return err if wildcard is not expanded before type coercion [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14130: URL: https://github.com/apache/datafusion/pull/14130#issuecomment-2593495598 I merged this PR up from main and merged a clippy fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14071: URL: https://github.com/apache/datafusion/pull/14071#discussion_r1917019302 ## README.md: ## @@ -146,3 +146,27 @@ stable API, we also improve the API over time. As a result, we typically deprecate methods before removing them, according to

[I] Discuss: Check in Cargo.lock file? [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new issue, #14135: URL: https://github.com/apache/datafusion/issues/14135 ### Is your feature request related to a problem or challenge? Broken out of a discussion on a PR here: - https://github.com/apache/datafusion/pull/14071#discussion_r1910286465 As describ

Re: [PR] feat: Supporting `SAMPLE` parsing [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
seve-martinez closed pull request #1566: feat: Supporting `SAMPLE` parsing URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Supporting `SAMPLE` parsing [datafusion-sqlparser-rs]

2025-01-15 Thread via GitHub
seve-martinez commented on PR #1566: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1566#issuecomment-2593395062 > I did not notice this PR when I started working on parsing the SAMPLE option, but since then it was merged: #1580 > > I can share that we used to parse the

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2593391114 I will update the datafusion-cli cargo file as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Minor: Document the rationale for the lack of Cargo.lock [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #14071: URL: https://github.com/apache/datafusion/pull/14071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Doc attribution: make `user_doc` to work with predefined consts. [datafusion]

2025-01-15 Thread via GitHub
comphead closed issue #14001: Doc attribution: make `user_doc` to work with predefined consts. URL: https://github.com/apache/datafusion/issues/14001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] doc-gen: make user_doc to work with predefined consts [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #14086: URL: https://github.com/apache/datafusion/pull/14086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Can no longer easily join duplicate schemas as of version 43 [datafusion]

2025-01-15 Thread via GitHub
comphead closed issue #14112: Can no longer easily join duplicate schemas as of version 43 URL: https://github.com/apache/datafusion/issues/14112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: add `alias()` method for DataFrame [datafusion]

2025-01-15 Thread via GitHub
comphead merged PR #14127: URL: https://github.com/apache/datafusion/pull/14127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14083: URL: https://github.com/apache/datafusion/pull/14083#discussion_r1916965668 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -36,10 +36,14 @@ recursive_protection = ["dep:recursive"] [dependencies] arrow = { workspace = true } +arrow-s

[PR] Remove dependency on physical-optimizer on functions-aggregates [datafusion]

2025-01-15 Thread via GitHub
alamb opened a new pull request, #14134: URL: https://github.com/apache/datafusion/pull/14134 Draft it is based on https://github.com/apache/datafusion/pull/14083 from @mnpw ## Which issue does this PR close? This is a follow up to - https://github.com/apache/datafusion/pul

Re: [I] Async User Defined Functions (UDF) [datafusion]

2025-01-15 Thread via GitHub
adriangb commented on issue #6518: URL: https://github.com/apache/datafusion/issues/6518#issuecomment-2593297771 I just came across this use case today and am very interested, it would be amazing if DataFusion just had https://github.com/apache/datafusion/issues/6518#issuecomment-2585270509

Re: [PR] fix: add support for Decimal128 and Decimal256 types in interval arithmetic [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14126: URL: https://github.com/apache/datafusion/pull/14126#discussion_r1916864288 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -76,6 +76,14 @@ macro_rules! get_extreme_value { DataType::Interval(IntervalUnit::MonthDayNa

Re: [I] create view with multi union use the first union schema as the final view schema [datafusion]

2025-01-15 Thread via GitHub
matthewmturner commented on issue #14132: URL: https://github.com/apache/datafusion/issues/14132#issuecomment-2593195210 @xudong963 this sounds similar to the issue we saw? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Implemented `simplify` for the `starts_with` function to convert it into a LIKE expression. [datafusion]

2025-01-15 Thread via GitHub
alamb commented on code in PR #14119: URL: https://github.com/apache/datafusion/pull/14119#discussion_r1916850331 ## datafusion/functions/src/string/starts_with.rs: ## @@ -98,6 +99,27 @@ impl ScalarUDFImpl for StartsWithFunc { } } +fn simplify( +&self

Re: [PR] Minor: fix duplicated SharedBitmapBuilder definitions [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14122: URL: https://github.com/apache/datafusion/pull/14122#issuecomment-2593146891 Thanks @jonahgao and @lewiszlw ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Automatically run sqlitetests regularly (but not with all PRs) to DataFusion [datafusion]

2025-01-15 Thread via GitHub
alamb closed issue #13967: Automatically run sqlitetests regularly (but not with all PRs) to DataFusion URL: https://github.com/apache/datafusion/issues/13967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Minor: fix duplicated SharedBitmapBuilder definitions [datafusion]

2025-01-15 Thread via GitHub
alamb merged PR #14122: URL: https://github.com/apache/datafusion/pull/14122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add sqlite sqllogictest run to extended.yml [datafusion]

2025-01-15 Thread via GitHub
alamb merged PR #14101: URL: https://github.com/apache/datafusion/pull/14101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add sqlite sqllogictest run to extended.yml [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14101: URL: https://github.com/apache/datafusion/pull/14101#issuecomment-2593141596 Let's merge this in and get some experience with running the extended suite on main. Thank you @Omega359 and @comphead -- This is an automated message from the Apache Git Service.

Re: [PR] chore: move `SanityChecker` into `physical-optimizer` crate [datafusion]

2025-01-15 Thread via GitHub
mnpw commented on PR #14083: URL: https://github.com/apache/datafusion/pull/14083#issuecomment-2593062436 @alamb Here's my understanding of the issue: - `datafusion` crate already depends on `datafusion-physical-optimizer`. - SanityChecker's tests depend on `datafusion::dat

Re: [PR] feat: metadata columns [datafusion]

2025-01-15 Thread via GitHub
alamb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2593033656 > Can these metadata columns utilize normal column properties, like ordering equivalences, constantness, distinctness etc.? For example, AFAIU rowid is an ordered column, and if I sort

Re: [PR] chore: extract math_funcs expressions to folders based on spark grouping [datafusion-comet]

2025-01-15 Thread via GitHub
rluvaton commented on PR #1219: URL: https://github.com/apache/datafusion-comet/pull/1219#issuecomment-2593045630 @andygrove Yes, would you mind first merging https://github.com/apache/datafusion-comet/pull/1223 so if I have more conflicts I can resolve them all at once -- This is an au

Re: [PR] Propagate table constraints through physical plans to optimize sort operations [datafusion]

2025-01-15 Thread via GitHub
gokselk commented on code in PR #14111: URL: https://github.com/apache/datafusion/pull/14111#discussion_r1916731975 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -210,30 +221,25 @@ impl FileScanConfig { self } -/// Project the sch

  1   2   >