Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-10-24 Thread via GitHub
github-actions[bot] closed pull request #10386: feat: RewriteCycle API for short-circuiting optimizer loops URL: https://github.com/apache/datafusion/pull/10386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] JoinOptimization: Add build side pushdown to probe side [datafusion]

2024-10-24 Thread via GitHub
Lordworms commented on PR #13054: URL: https://github.com/apache/datafusion/pull/13054#issuecomment-2436724056 > > How would you display them in sources? The dynamic filter will only be added during execution, so it will only be available through e.g. ParquetExec after loading the build sid

Re: [PR] Optimization for CASE WHEN for protecting against divide by zero [datafusion]

2024-10-24 Thread via GitHub
github-actions[bot] commented on PR #12049: URL: https://github.com/apache/datafusion/pull/12049#issuecomment-2436655276 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
goldmedal commented on PR #13059: URL: https://github.com/apache/datafusion/pull/13059#issuecomment-2436983493 Thanks again @notfilippo @jcsherin πŸ‘ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Unparser: numeric values in window frame definition are converted into string literals [datafusion]

2024-10-24 Thread via GitHub
goldmedal closed issue #12982: Unparser: numeric values in window frame definition are converted into string literals URL: https://github.com/apache/datafusion/issues/12982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
goldmedal merged PR #13059: URL: https://github.com/apache/datafusion/pull/13059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Preceding and Following (WindowFrameBound) are incorrectly handled when an unoptimized plan created via SQL is converted to a substrait Plan [datafusion]

2024-10-24 Thread via GitHub
goldmedal closed issue #11432: Preceding and Following (WindowFrameBound) are incorrectly handled when an unoptimized plan created via SQL is converted to a substrait Plan URL: https://github.com/apache/datafusion/issues/11432 -- This is an automated message from the Apache Git Service. To r

Re: [I] Preceding and Following (WindowFrameBound) are incorrectly handled when an unoptimized plan created via SQL is converted to a substrait Plan [datafusion]

2024-10-24 Thread via GitHub
goldmedal closed issue #11432: Preceding and Following (WindowFrameBound) are incorrectly handled when an unoptimized plan created via SQL is converted to a substrait Plan URL: https://github.com/apache/datafusion/issues/11432 -- This is an automated message from the Apache Git Service. To r

Re: [I] Verify TPC-DS answers [datafusion]

2024-10-24 Thread via GitHub
Lordworms commented on issue #13073: URL: https://github.com/apache/datafusion/issues/13073#issuecomment-2436891996 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add planning benchmarks with parquet and sortedness [datafusion]

2024-10-24 Thread via GitHub
Omega359 commented on issue #13098: URL: https://github.com/apache/datafusion/issues/13098#issuecomment-2436311075 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Draft: Logical signature [datafusion]

2024-10-24 Thread via GitHub
jayzhan211 closed pull request #13104: Draft: Logical signature URL: https://github.com/apache/datafusion/pull/13104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Draft: Logical signature [datafusion]

2024-10-24 Thread via GitHub
jayzhan211 commented on code in PR #13104: URL: https://github.com/apache/datafusion/pull/13104#discussion_r1816025254 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -506,14 +494,35 @@ fn get_valid_types( ); } +let mut new_ty

Re: [PR] Draft: Logical signature [datafusion]

2024-10-24 Thread via GitHub
jayzhan211 commented on code in PR #13104: URL: https://github.com/apache/datafusion/pull/13104#discussion_r1816025254 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -506,14 +494,35 @@ fn get_valid_types( ); } +let mut new_ty

[PR] Enable reading `StringViewArray` by default from Parquet [datafusion]

2024-10-24 Thread via GitHub
alamb opened a new pull request, #13101: URL: https://github.com/apache/datafusion/pull/13101 Replacement for https://github.com/apache/datafusion/pull/12092 which had too much history on it Draft as it builds on: - [ ] https://github.com/apache/datafusion/pull/12816 @goldmedal

Re: [PR] feat(logical-types): add NativeType and LogicalType [datafusion]

2024-10-24 Thread via GitHub
jayzhan211 commented on PR #12853: URL: https://github.com/apache/datafusion/pull/12853#issuecomment-2436897770 @notfilippo I think we need a way to know the physical types that we are able to casted to given the Logical type. For example, if we have signature which expect Logical::St

Re: [PR] [docs]: migrate lead/lag window function docs to new docs [datafusion]

2024-10-24 Thread via GitHub
Omega359 commented on PR #13095: URL: https://github.com/apache/datafusion/pull/13095#issuecomment-2435457861 Thank you for migrating this documentation! Would it be possible to add sql examples for each? If not I can file a followup PR to have those added to the docs for these functions an

Re: [PR] Enable reading `StringViewArray` by default from Parquet [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #12092: URL: https://github.com/apache/datafusion/pull/12092#issuecomment-2436268625 I made a new PR as this one has lots of now irrelevant historical context New PR here: https://github.com/apache/datafusion/pull/13101 -- This is an automated message from the

[I] Add planning benchmarks with parquet and sortedness [datafusion]

2024-10-24 Thread via GitHub
alamb opened a new issue, #13098: URL: https://github.com/apache/datafusion/issues/13098 ### Is your feature request related to a problem or challenge? @mnorfolk03 added planning benchmark for more sophisticated queries here https://github.com/apache/datafusion/pull/13085 ❀️ Th

Re: [I] Improve Planning Time [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #13015: URL: https://github.com/apache/datafusion/issues/13015#issuecomment-2435056891 Thank you @goldmedal -- this is a nice analysis -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Convert `ntile` builtIn function to UDWF [datafusion]

2024-10-24 Thread via GitHub
jcsherin commented on PR #13040: URL: https://github.com/apache/datafusion/pull/13040#issuecomment-2435634752 @jatin510 Thanks πŸ™Œ. The udwf epic is almost complete! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [Optimization] Infer predicate under all JoinTypes [datafusion]

2024-10-24 Thread via GitHub
JasonLi-cn commented on code in PR #13081: URL: https://github.com/apache/datafusion/pull/13081#discussion_r1816068121 ## datafusion/optimizer/src/utils.rs: ## @@ -117,3 +124,165 @@ pub fn log_plan(description: &str, plan: &LogicalPlan) { debug!("{description}:\n{}\n", plan

Re: [PR] Enable reading `StringViewArray` by default from Parquet [datafusion]

2024-10-24 Thread via GitHub
alamb closed pull request #12092: Enable reading `StringViewArray` by default from Parquet URL: https://github.com/apache/datafusion/pull/12092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Improve Planning Time [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #13015: URL: https://github.com/apache/datafusion/issues/13015#issuecomment-2435070554 Here is another idea: https://github.com/apache/datafusion/pull/13018#issuecomment-2435068514 -- This is an automated message from the Apache Git Service. To respond to the mess

[PR] docs: Added Special Functions Page [datafusion]

2024-10-24 Thread via GitHub
jonathanc-n opened a new pull request, #13102: URL: https://github.com/apache/datafusion/pull/13102 ## Which issue does this PR close? Closes #13036. ## Rationale for this change ## What changes are included in this PR? Added static special functions pa

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
notfilippo commented on code in PR #13059: URL: https://github.com/apache/datafusion/pull/13059#discussion_r1814844142 ## datafusion/expr/src/window_frame.rs: ## @@ -334,51 +335,69 @@ impl WindowFrameBound { } } -impl TryFrom for WindowFrameBound { -type Error = Data

Re: [PR] JoinOptimization: Add build side pushdown to probe side [datafusion]

2024-10-24 Thread via GitHub
Lordworms commented on PR #13054: URL: https://github.com/apache/datafusion/pull/13054#issuecomment-2436827458 I think it is ready now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Access children `DataType` or return-type in `ScalarUDFImpl::invoke` [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #12819: URL: https://github.com/apache/datafusion/issues/12819#issuecomment-2435382904 > I think your proposed solution is? Yes that is what I was thinking > I was hoping to use the result of return_type_from_exprs(self, &[Expr], &dyn ExprSchema, &[Data

Re: [I] Ballista Python Update [datafusion-ballista]

2024-10-24 Thread via GitHub
timsaucer commented on issue #1091: URL: https://github.com/apache/datafusion-ballista/issues/1091#issuecomment-2436178599 Sorry I haven’t been able lately to give this more attention, but I hope next week my time clears up some. -- This is an automated message from the Apache Git Servic

Re: [I] Ballista Python Update [datafusion-ballista]

2024-10-24 Thread via GitHub
milenkovicm commented on issue #1091: URL: https://github.com/apache/datafusion-ballista/issues/1091#issuecomment-2436182114 No worries @timsaucer, I just want to note important point you brought. Thanks a lot -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Implement `Eq`, `PartialEq`, `Hash` for `dyn PhysicalExpr` [datafusion]

2024-10-24 Thread via GitHub
peter-toth commented on code in PR #13005: URL: https://github.com/apache/datafusion/pull/13005#discussion_r1815636846 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -48,11 +47,11 @@ use kernels::{ }; /// Binary expression -#[derive(Debug, Hash, Clone)] -pub st

Re: [PR] Introduce `binary_as_string` parquet option, upgrade to arrow/parquet `53.2.0` [datafusion]

2024-10-24 Thread via GitHub
alamb commented on code in PR #12816: URL: https://github.com/apache/datafusion/pull/12816#discussion_r1815631208 ## Cargo.toml: ## @@ -70,22 +70,22 @@ version = "42.1.0" ahash = { version = "0.8", default-features = false, features = [ "runtime-rng", ] } -arrow = { versi

Re: [I] [EPIC] Automatically generate all function documentation from code [datafusion]

2024-10-24 Thread via GitHub
Omega359 commented on issue #12740: URL: https://github.com/apache/datafusion/issues/12740#issuecomment-2436258791 lead/lag PR @ https://github.com/apache/datafusion/pull/13095 Other functions that still exist on scalar functions page - https://github.com/apache/datafusion/issues/1303

[I] Update ClickBench benchmarks with DataFusion `43` [datafusion]

2024-10-24 Thread via GitHub
alamb opened a new issue, #13099: URL: https://github.com/apache/datafusion/issues/13099 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] minor: Add deprecated policy to the contributor guide contents [datafusion]

2024-10-24 Thread via GitHub
alamb commented on code in PR #13100: URL: https://github.com/apache/datafusion/pull/13100#discussion_r1815654309 ## docs/source/index.rst: ## @@ -130,6 +130,7 @@ To get started, see library-user-guide/extending-operators library-user-guide/profiling library-user-gui

Re: [PR] [docs]: migrate lead/lag window function docs to new docs [datafusion]

2024-10-24 Thread via GitHub
alamb merged PR #13095: URL: https://github.com/apache/datafusion/pull/13095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] minor: Add deprecated policy to the contributor guide contents [datafusion]

2024-10-24 Thread via GitHub
comphead merged PR #13100: URL: https://github.com/apache/datafusion/pull/13100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Improve CSE stats [datafusion]

2024-10-24 Thread via GitHub
peter-toth commented on PR #13080: URL: https://github.com/apache/datafusion/pull/13080#issuecomment-2434547000 Thank you @alamb for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] chore: Added a number of physical planning join benchmarks [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #13085: URL: https://github.com/apache/datafusion/pull/13085#issuecomment-2435023400 Thank you @mnorfolk03! Welcome to DataFusion (and rust!) I think the CI test here https://github.com/apache/datafusion/actions/runs/11490928733/job/31986697253?pr=13085

Re: [PR] feat: Migrate Map Functions [datafusion]

2024-10-24 Thread via GitHub
alamb merged PR #13047: URL: https://github.com/apache/datafusion/pull/13047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Migrate Map Functions [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #13047: URL: https://github.com/apache/datafusion/pull/13047#issuecomment-2435014964 πŸš€ lets keep things moving -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] chore: Added a number of physical planning join benchmarks [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #13085: URL: https://github.com/apache/datafusion/pull/13085#issuecomment-2435013353 > Ah I see @alamb suggested to start with physical planning in the ticket first, sorry for the noise :) > > I do think some focused join _execution_ benchmarks can be very useful

Re: [I] `CometBuffer` can potentially lead to concurrent modification of a held buffer (aka is "Unsound" in Rust terms) [datafusion-comet]

2024-10-24 Thread via GitHub
alamb commented on issue #1035: URL: https://github.com/apache/datafusion-comet/issues/1035#issuecomment-2434935790 > The scan code is not developed by me. I guess that may not work as the CometBuffer internally doesn't use pointer like Arc. It is very low-level raw pointer manipulation. T

Re: [PR] Convert `ntile` builtIn function to UDWF [datafusion]

2024-10-24 Thread via GitHub
jcsherin commented on PR #13040: URL: https://github.com/apache/datafusion/pull/13040#issuecomment-2435516093 @jatin510 Can you please mark this PR as draft so that it is not accidentally merged into main while you are working on changes. You can change it back when you are ready, so the co

Re: [PR] [docs]: migrate lead/lag window function docs to new docs [datafusion]

2024-10-24 Thread via GitHub
buraksenn commented on PR #13095: URL: https://github.com/apache/datafusion/pull/13095#issuecomment-2435524578 > Thank you for migrating this documentation! Would it be possible to add sql examples for each? If not I can file a followup PR to have those added to the docs for these functions

Re: [I] Migrate documentation for remaining window functions to window_functions.md [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #12936: URL: https://github.com/apache/datafusion/issues/12936#issuecomment-2436286662 I think this is now done (thank you @buraksenn ) I am hoping that we can merge https://github.com/apache/datafusion/pull/12938 and then that will prevent any new functions

Re: [I] [EPIC] Automatically generate all function documentation from code [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #12740: URL: https://github.com/apache/datafusion/issues/12740#issuecomment-2436284480 Here is a PR to make sure no new functions are added without documentation: https://github.com/apache/datafusion/pull/12938 -- This is an automated message from the Apache Git S

Re: [I] Migrate documentation for remaining window functions to window_functions.md [datafusion]

2024-10-24 Thread via GitHub
alamb closed issue #12936: Migrate documentation for remaining window functions to window_functions.md URL: https://github.com/apache/datafusion/issues/12936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Introduce `binary_as_string` parquet option, upgrade to arrow/parquet `53.2.0` [datafusion]

2024-10-24 Thread via GitHub
alamb commented on code in PR #12816: URL: https://github.com/apache/datafusion/pull/12816#discussion_r1815642566 ## benchmarks/src/clickbench.rs: ## @@ -115,12 +115,15 @@ impl RunOpt { None => queries.min_query_id()..=queries.max_query_id(), }; +

Re: [PR] Improve documentation and examples for `SchemaAdapterFactory`, make `record_batch` "hygenic" [datafusion]

2024-10-24 Thread via GitHub
alamb commented on code in PR #13063: URL: https://github.com/apache/datafusion/pull/13063#discussion_r1815673753 ## datafusion/core/src/datasource/schema_adapter.rs: ## @@ -79,11 +87,17 @@ pub trait SchemaAdapter: Send + Sync { ) -> datafusion_common::Result<(Arc, Vec)>;

Re: [PR] Implement `Eq`, `PartialEq`, `Hash` for `dyn PhysicalExpr` [datafusion]

2024-10-24 Thread via GitHub
alamb commented on code in PR #13005: URL: https://github.com/apache/datafusion/pull/13005#discussion_r1815683713 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -48,11 +47,11 @@ use kernels::{ }; /// Binary expression -#[derive(Debug, Hash, Clone)] -pub struct

Re: [PR] chore: Added a number of physical planning join benchmarks [datafusion]

2024-10-24 Thread via GitHub
mnorfolk03 commented on PR #13085: URL: https://github.com/apache/datafusion/pull/13085#issuecomment-2435902070 @alamb I've run cargo fmt. > Thanks @mnorfolk03 -- this is a nice improvement > > I ran these benchmarks and they look good. > > ```sql > cargo bench --benc

Re: [PR] perf: Cache jstrings during metrics collection [datafusion-comet]

2024-10-24 Thread via GitHub
andygrove commented on PR #1029: URL: https://github.com/apache/datafusion-comet/pull/1029#issuecomment-2435653650 I wonder why there is such a large regression with q72 though :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [Optimization] Infer predicate under all JoinTypes [datafusion]

2024-10-24 Thread via GitHub
comphead commented on code in PR #13081: URL: https://github.com/apache/datafusion/pull/13081#discussion_r1815341813 ## datafusion/optimizer/src/utils.rs: ## @@ -117,3 +124,165 @@ pub fn log_plan(description: &str, plan: &LogicalPlan) { debug!("{description}:\n{}\n", plan.d

Re: [PR] [Optimization] Infer predicate under all JoinTypes [datafusion]

2024-10-24 Thread via GitHub
comphead commented on code in PR #13081: URL: https://github.com/apache/datafusion/pull/13081#discussion_r1815344135 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -2907,6 +3064,46 @@ Projection: a, b assert_optimized_plan_eq(plan, expected) } +#[test

Re: [PR] Documentation: Add API deprecation policy [datafusion]

2024-10-24 Thread via GitHub
comphead merged PR #13083: URL: https://github.com/apache/datafusion/pull/13083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Documentation: Add API deprecation policy [datafusion]

2024-10-24 Thread via GitHub
comphead commented on code in PR #13083: URL: https://github.com/apache/datafusion/pull/13083#discussion_r1815326882 ## docs/source/library-user-guide/api-health.md: ## @@ -0,0 +1,37 @@ + + +# API health policy Review Comment: Added small policy description and a link from R

[PR] Added Doc Instance [datafusion]

2024-10-24 Thread via GitHub
jonathanc-n opened a new pull request, #13097: URL: https://github.com/apache/datafusion/pull/13097 ## Which issue does this PR close? Closes #13093 . ## Rationale for this change ## What changes are included in this PR? Added new instance of the docume

Re: [PR] Documentation: Add API deprecation policy [datafusion]

2024-10-24 Thread via GitHub
comphead commented on code in PR #13083: URL: https://github.com/apache/datafusion/pull/13083#discussion_r1815297773 ## docs/source/library-user-guide/api-health.md: ## @@ -0,0 +1,37 @@ + + +# API health policy Review Comment: That totally makes sense -- This is an autom

Re: [PR] feat: Implement native version of ColumnarToRow [datafusion-comet]

2024-10-24 Thread via GitHub
parthchandra commented on PR #1034: URL: https://github.com/apache/datafusion-comet/pull/1034#issuecomment-2435919148 Initial performance numbers for this implementation are not looking good. There are two areas where things are getting slower compared to Spark 1 . No WholestageCodegen

Re: [PR] Minor: Add documentation for `cot` [datafusion]

2024-10-24 Thread via GitHub
alamb merged PR #13069: URL: https://github.com/apache/datafusion/pull/13069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add some DataFrame method(s) to combine two inputs where the schema can be different [datafusion]

2024-10-24 Thread via GitHub
Omega359 commented on issue #12650: URL: https://github.com/apache/datafusion/issues/12650#issuecomment-2435151770 My thought was to behave exactly like union does today in that respect. The docs on union have links to helpers if type coercion is required though: https://docs.rs/dataf

Re: [PR] feat(logical-types): add NativeType and LogicalType [datafusion]

2024-10-24 Thread via GitHub
notfilippo commented on PR #12853: URL: https://github.com/apache/datafusion/pull/12853#issuecomment-2435161877 cc @goldmedal @ozankabak maybe you can take a look at this PR as well if you have bandwidth -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] Migrate documentation for remaining window functions to window_functions.md [datafusion]

2024-10-24 Thread via GitHub
Omega359 commented on issue #12936: URL: https://github.com/apache/datafusion/issues/12936#issuecomment-2435274221 Would it be possible to add documentation for lead/lag prior to waiting for all the remaining window functions to be migrated over? -- This is an automated message from the A

Re: [PR] Convert `ntile` builtIn function to UDWF [datafusion]

2024-10-24 Thread via GitHub
jcsherin commented on code in PR #13040: URL: https://github.com/apache/datafusion/pull/13040#discussion_r1815168730 ## datafusion/functions-window/src/ntile.rs: ## @@ -0,0 +1,200 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Convert `ntile` builtIn function to UDWF [datafusion]

2024-10-24 Thread via GitHub
jcsherin commented on PR #13040: URL: https://github.com/apache/datafusion/pull/13040#issuecomment-2435680521 > @jcsherin Oops ! I accidentally worked on `ntile` udwf, I was supposed to work on `nth` udwf. Also the branch name is wrong : [jatin510:feature/12649-udwf-nth-value](https://githu

Re: [PR] perf: Cache jstrings during metrics collection [datafusion-comet]

2024-10-24 Thread via GitHub
andygrove commented on PR #1029: URL: https://github.com/apache/datafusion-comet/pull/1029#issuecomment-2435651121 Even bigger wins after that last commit :rocket: ![tpch_allqueries](https://github.com/user-attachments/assets/98826515-b31c-4716-8b1b-4ce6f2626f12) ![tpcds_quer

Re: [I] docs: fix generate_series documentation [datafusion]

2024-10-24 Thread via GitHub
alamb closed issue #13093: docs: fix generate_series documentation URL: https://github.com/apache/datafusion/issues/13093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Add absolute_paths clippy lint with 4 maximum segments. [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #13086: URL: https://github.com/apache/datafusion/pull/13086#issuecomment-2436115297 > The github action clippy test detects issues in datafusion-examples, but I'm unable to reproduce that locally. > > I'm fixing those issues based on the action result, I'm not s

Re: [PR] Add absolute_paths clippy lint with 4 maximum segments. [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #13086: URL: https://github.com/apache/datafusion/pull/13086#issuecomment-2436117357 FYI @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] perf: Cache jstrings during metrics collection [datafusion-comet]

2024-10-24 Thread via GitHub
andygrove commented on PR #1029: URL: https://github.com/apache/datafusion-comet/pull/1029#issuecomment-2436131793 > 2. What is the thread safety of this approach? It's unclear to me if multiple threads could be sharing this call stack and trying to write new values into the cache at the s

Re: [I] Datafusion 42 does not raise plan error on some queries [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #13092: URL: https://github.com/apache/datafusion/issues/13092#issuecomment-2436126313 I think SQL requires the inputs to have the same number of columns Here is an example in postgres ```sql postgres=# create table foo (x int, y int) ; CREATE TAB

Re: [I] Infer types in values clause [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #5046: URL: https://github.com/apache/datafusion/issues/5046#issuecomment-2435049163 πŸŽ‰ Thank you very much @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] fix: planning of prepare statement with limit clause [datafusion]

2024-10-24 Thread via GitHub
jonahgao opened a new pull request, #13088: URL: https://github.com/apache/datafusion/pull/13088 ## Which issue does this PR close? Closes #12294. ## Rationale for this change This PR enables creating logical plan for a prepare statement with a limit clause. Creat

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
notfilippo commented on code in PR #13059: URL: https://github.com/apache/datafusion/pull/13059#discussion_r1814842636 ## datafusion/expr/src/window_frame.rs: ## @@ -334,51 +335,69 @@ impl WindowFrameBound { } } -impl TryFrom for WindowFrameBound { -type Error = Data

Re: [I] Improve performance of `regexp_count` [datafusion]

2024-10-24 Thread via GitHub
Omega359 commented on issue #13011: URL: https://github.com/apache/datafusion/issues/13011#issuecomment-2435421515 It's still 50% slower than replace so I think it may be worth investigating at some point. Is it a huge blocker? Unlikely unless this is a hotspot in someone's processing pipel

Re: [I] [Rust] Implement micro benchmarks for each operator [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #94: URL: https://github.com/apache/datafusion/issues/94#issuecomment-2436166641 Given the lack of specificity on this ticket (it tracks a basic idea rather than any particular project I think) I'll claim it is done for the moment I think a better approach is

Re: [PR] Executor configuration accepts SessionState .. [datafusion-ballista]

2024-10-24 Thread via GitHub
milenkovicm commented on PR #1099: URL: https://github.com/apache/datafusion-ballista/pull/1099#issuecomment-2436161989 will be finalised once #1096 gets merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: Implement native version of ColumnarToRow [datafusion-comet]

2024-10-24 Thread via GitHub
andygrove commented on code in PR #1034: URL: https://github.com/apache/datafusion-comet/pull/1034#discussion_r1815594934 ## native/core/src/execution/shuffle/row.rs: ## @@ -235,6 +250,143 @@ impl SparkUnsafeRow { } } +#[allow(clippy::needless_range_loop)] +

Re: [PR] chore: Added a number of physical planning join benchmarks [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #13085: URL: https://github.com/apache/datafusion/pull/13085#issuecomment-2436135534 > I'll leave this for a future PR since I've run out of free time this week. Should I open an issue for this in that case? Thanks for your help @mnorfolk03 -- that is great. Ind

[PR] feat: Move subquery check from analyzer to PullUpCorrelatedExpr (Fix TPC-DS q41) [datafusion]

2024-10-24 Thread via GitHub
eejbyfeldt opened a new pull request, #13091: URL: https://github.com/apache/datafusion/pull/13091 ## Which issue does this PR close? Closes #13074. ## Rationale for this change The goal here is to support TPC-DS q41 which has an expression that can not be pull up

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
notfilippo commented on code in PR #13059: URL: https://github.com/apache/datafusion/pull/13059#discussion_r1814827972 ## datafusion/expr/src/window_frame.rs: ## @@ -334,51 +335,69 @@ impl WindowFrameBound { } } -impl TryFrom for WindowFrameBound { -type Error = Data

Re: [PR] Add More `Arc` to AggregateFunctionExpr [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #13012: URL: https://github.com/apache/datafusion/pull/13012#issuecomment-2435021232 > Wouldn't it be better to maintain consistency throughout the entire codebase? I agree it would be good to maintain consistency when possible. Are you thinking that this PR is

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
goldmedal commented on code in PR #13059: URL: https://github.com/apache/datafusion/pull/13059#discussion_r1814765591 ## datafusion/expr/src/window_frame.rs: ## @@ -334,51 +335,69 @@ impl WindowFrameBound { } } -impl TryFrom for WindowFrameBound { -type Error = DataF

Re: [PR] perf: Cache jstrings during metrics collection [datafusion-comet]

2024-10-24 Thread via GitHub
mbutrovich commented on code in PR #1029: URL: https://github.com/apache/datafusion-comet/pull/1029#discussion_r1814900680 ## native/core/src/execution/jni_api.rs: ## @@ -346,6 +347,10 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( let exe

Re: [PR] feat(logical-types): add NativeType and LogicalType [datafusion]

2024-10-24 Thread via GitHub
berkaysynnada commented on PR #12853: URL: https://github.com/apache/datafusion/pull/12853#issuecomment-2435194014 Has the [logical type branch](https://github.com/apache/datafusion/tree/logical-types) reached its final state, and are we starting the migration? -- This is an automated me

Re: [PR] feat(logical-types): add NativeType and LogicalType [datafusion]

2024-10-24 Thread via GitHub
notfilippo commented on PR #12853: URL: https://github.com/apache/datafusion/pull/12853#issuecomment-2435211194 > Has the [logical type branch](https://github.com/apache/datafusion/tree/logical-types?rgh-link-date=2024-10-24T12%3A43%3A34Z) reached its final state, and are we starting the mi

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
notfilippo commented on PR #13059: URL: https://github.com/apache/datafusion/pull/13059#issuecomment-2435147352 @goldmedal I've removed SingleQuotedString according to https://github.com/apache/datafusion/pull/13059#discussion_r1814842636. -- This is an automated message from the Apache G

Re: [PR] feat: improve type inference for WindowFrame [datafusion]

2024-10-24 Thread via GitHub
goldmedal commented on PR #13059: URL: https://github.com/apache/datafusion/pull/13059#issuecomment-2435303973 Thanks @notfilippo If no more comments from the other one, I plan to merge this tomorrow. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] feat(logical-types): add NativeType and LogicalType [datafusion]

2024-10-24 Thread via GitHub
ozankabak commented on PR #12853: URL: https://github.com/apache/datafusion/pull/12853#issuecomment-2435305044 I am traveling currently but will have some bandwidth to take a look early next week if noone beats me to it -- This is an automated message from the Apache Git Service. To respo

[I] docs: fix generate_series documentation [datafusion]

2024-10-24 Thread via GitHub
Omega359 opened a new issue, #13093: URL: https://github.com/apache/datafusion/issues/13093 ### Describe the bug Currently generate_series documentation is a copy of the range documentation however the two are slightly different. `generate_series` includes the upper bound whereas `ra

Re: [I] Oct 21, 2024: This week in DataFusion [datafusion]

2024-10-24 Thread via GitHub
alamb commented on issue #13035: URL: https://github.com/apache/datafusion/issues/13035#issuecomment-2435322881 @SamSynnada started a great discussion about better spreading the word about DataFusion. Thank you πŸ™ -- it is going to be a great year - https://github.com/apache/datafusion/di

Re: [PR] feat: Implement native version of ColumnarToRow [datafusion-comet]

2024-10-24 Thread via GitHub
parthchandra commented on PR #1034: URL: https://github.com/apache/datafusion-comet/pull/1034#issuecomment-2435933800 @mbutrovich @andygrove any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] JoinOptimization: Add build side pushdown to probe side [datafusion]

2024-10-24 Thread via GitHub
berkaysynnada commented on PR #13054: URL: https://github.com/apache/datafusion/pull/13054#issuecomment-2434890005 > How would you display them in sources? The dynamic filter will only be added during execution, so it will only be available through e.g. ParquetExec after loading the build s

Re: [PR] chore: Added a number of physical planning join benchmarks [datafusion]

2024-10-24 Thread via GitHub
Dandandan commented on PR #13085: URL: https://github.com/apache/datafusion/pull/13085#issuecomment-2434894992 Ah I see @alamb suggested to start with physical planning in the ticket first, sorry for the noise :) I do think some focused join *execution* benchmarks can be very useful t

[I] External aggregation reserves more memory than actual usage [datafusion]

2024-10-24 Thread via GitHub
2010YOUY01 opened a new issue, #13089: URL: https://github.com/apache/datafusion/issues/13089 ### Describe the bug The below query requires 65M memory to run, if we set memory limit to 50M, it can not run successfully Run in datafusion-cli: ``` cargo run -- --mem-pool-type fa

Re: [PR] Convert `ntile` builtIn function to UDWF [datafusion]

2024-10-24 Thread via GitHub
jcsherin commented on code in PR #13040: URL: https://github.com/apache/datafusion/pull/13040#discussion_r1814731837 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -940,7 +940,6 @@ async fn roundtrip_expr_api() -> Result<()> { vec![lit(1), lit(2),

Re: [I] `CometBuffer` can potentially lead to concurrent modification of a held buffer (aka is "Unsound" in Rust terms) [datafusion-comet]

2024-10-24 Thread via GitHub
alamb commented on issue #1035: URL: https://github.com/apache/datafusion-comet/issues/1035#issuecomment-2434924146 Well, I don't presume to know what is / isn't the right design for other systems. I think this ticket will serve as a good discussion for how to potentially improve t

Re: [PR] Introduce `binary_as_string` parquet option [datafusion]

2024-10-24 Thread via GitHub
alamb commented on PR #12816: URL: https://github.com/apache/datafusion/pull/12816#issuecomment-2435009622 Update here is we are on track to release a version of arrow with the required fixes today and then I will merge this PR up and get it ready for review ⏲️ -- This is an automated m

Re: [I] Migrate documentation for remaining window functions to window_functions.md [datafusion]

2024-10-24 Thread via GitHub
buraksenn commented on issue #12936: URL: https://github.com/apache/datafusion/issues/12936#issuecomment-2435380047 @Omega359 added lead/lag window function docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [EPIC] Support TPC-DS benchmarks [datafusion]

2024-10-24 Thread via GitHub
onursatici commented on issue #4763: URL: https://github.com/apache/datafusion/issues/4763#issuecomment-2435286033 this problem with q35, taken from the description above: ``` - `Projections require unique expression names but the expression "MAX(customer_demographics.cd_dep_count)" at

  1   2   >