Re: [PR] Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [datafusion]

2024-05-13 Thread via GitHub
berkaysynnada commented on code in PR #10404: URL: https://github.com/apache/datafusion/pull/10404#discussion_r1597969687 ## datafusion/core/src/datasource/stream.rs: ## @@ -58,12 +58,22 @@ impl TableProviderFactory for StreamTableFactory { let schema: SchemaRef = Arc::

[I] Connection reset by peer on AWS S3 object store. [datafusion]

2024-05-13 Thread via GitHub
Smotrov opened a new issue, #10478: URL: https://github.com/apache/datafusion/issues/10478 ### Describe the bug I'm trying to read 6GB table of compressed NDJSON data from S3. The data is compressed with ZStd with about x100 compression ratio. Files are stored HIVE partitioned and ha

Re: [PR] improve monotonicity api [datafusion]

2024-05-13 Thread via GitHub
alamb commented on code in PR #10117: URL: https://github.com/apache/datafusion/pull/10117#discussion_r1598189334 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -251,10 +251,23 @@ pub fn out_ordering( func: &FuncMonotonicity, arg_orderings: &[SortProperties]

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10454: URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2107127069 > Very hard to get consistent benchmark results on a personal computer when there's so much process scheduling noise Yeah, I have a gcp VM running on which I run the benchmarks

Re: [PR] Implement `From>` for `LogicalPlanBuilder` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10466: URL: https://github.com/apache/datafusion/pull/10466#issuecomment-2107218784 Thanks @AbrarNitk ! I started the CI checks on this PR Normally I think we should add some test coverage so we don't accidentally break this in the future. This would I

Re: [I] Implement `LogicalPlanBuilder::from` for `Arc` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10465: URL: https://github.com/apache/datafusion/issues/10465#issuecomment-2107222678 Great idea @ClSlaid Thanks to @AbrarNitk we have a first version of `LogicalPlanBuilder::from(arc_input)` in https://github.com/apache/datafusion/pull/10466 🙏 I th

Re: [I] DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax: `DELIMITER`, `WITH HEADER ROW` and `COMPRESSION` [datafusion]

2024-05-13 Thread via GitHub
alamb closed issue #10414: DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax: `DELIMITER`, `WITH HEADER ROW` and `COMPRESSION` URL: https://github.com/apache/datafusion/issues/10414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10404: URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2107232010 Thanks again for this work @berkaysynnada and the guidance @ozankabak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [datafusion]

2024-05-13 Thread via GitHub
alamb merged PR #10404: URL: https://github.com/apache/datafusion/pull/10404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Overwritten Format Configs by CreateExternalTable Options [datafusion]

2024-05-13 Thread via GitHub
alamb closed issue #9945: Overwritten Format Configs by CreateExternalTable Options URL: https://github.com/apache/datafusion/issues/9945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[I] Document committer / PMC process [datafusion]

2024-05-13 Thread via GitHub
alamb opened a new issue, #10479: URL: https://github.com/apache/datafusion/issues/10479 ### Is your feature request related to a problem or challenge? As part of governing DataFusion in the open and via the Apache Way, we should make sure that as much as possible is done in the open.

[I] Create presentation for DataFusion SIGMOD 2024 paper [datafusion]

2024-05-13 Thread via GitHub
alamb opened a new issue, #10480: URL: https://github.com/apache/datafusion/issues/10480 ### Is your feature request related to a problem or challenge? @JayjeetAtGithub @Dandandan @yjshen @ozankabak @sunchao and @viirya wrote and submitted a paper to the [SIGMOD 2024 conference](

[I] Keynote presentation for SiMoD workshop at SIGMOD 2024 [datafusion]

2024-05-13 Thread via GitHub
alamb opened a new issue, #10481: URL: https://github.com/apache/datafusion/issues/10481 I am giving an invited keynote talk at a workshop colocated with SIGMOD 2024 on Friday Jun 14, 2024 (after the main conference). I need to prepare slides for this and figured people in th

Re: [I] Keynote presentation for SiMoD workshop at SIGMOD 2024 [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10481: URL: https://github.com/apache/datafusion/issues/10481#issuecomment-2107276815 Here are some notes I have on what I want to talk about interfaces and then paradoxically allowed us to narrow the scope of potential optimizations (e.g. compute kernels) an

[I] DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 [datafusion]

2024-05-13 Thread via GitHub
alamb opened a new issue, #10482: URL: https://github.com/apache/datafusion/issues/10482 Follow on to https://github.com/apache/datafusion/issues/10395 My (personal) North ⭐ : 1000 projects are built using DataFusion 📈 **It would be great for other contributors to DataFusion wh

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 6, 2024 [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10395: URL: https://github.com/apache/datafusion/issues/10395#issuecomment-2107312167 Next week: https://github.com/apache/datafusion/issues/10482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 6, 2024 [datafusion]

2024-05-13 Thread via GitHub
alamb closed issue #10395: DataFusion weekly project plan (Andrew Lamb) - May 6, 2024 URL: https://github.com/apache/datafusion/issues/10395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] API in ParquetExec to pass in RowSelections to `ParquetExec` (enable custom indexes, finer grained pushdown) [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #9929: URL: https://github.com/apache/datafusion/issues/9929#issuecomment-2107313160 I hope to work on this issue this week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Add cast array test to sqllogictest [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10474: URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107373702 @jonahgao perhaps you would like to try merging this PR as a test that we have all the permissions setup correctly ? -- This is an automated message from the Apache Git Service. To

Re: [PR] Add `simplify` method to aggregate function [datafusion]

2024-05-13 Thread via GitHub
alamb merged PR #10354: URL: https://github.com/apache/datafusion/pull/10354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add a `AggregateUDFImpl::simplfy()` API [datafusion]

2024-05-13 Thread via GitHub
alamb closed issue #9526: Add a `AggregateUDFImpl::simplfy()` API URL: https://github.com/apache/datafusion/issues/9526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Add `simplify` method to aggregate function [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2107376306 Thanks again @jayzhan211 and @milenkovicm 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-13 Thread via GitHub
appletreeisyellow commented on issue #10295: URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107378437 I'd like to take this one if no one has worked on it yet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10295: URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107381824 Thanks @appletreeisyellow -- that would be great. No one has started as far as I can tell. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10295: URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107386884 I took a quick look at https://github.com/apache/datafusion/blob/58cc4e1289451b30adca4721fd6eb5a36b26a2cd/datafusion/optimizer/src/single_distinct_to_groupby.rs#L59 Looks to

Re: [I] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-13 Thread via GitHub
appletreeisyellow commented on issue #10295: URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107390722 @alamb Thank you for the guidance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[PR] Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require quotations for simple namespaced keys like `foo.bar` [datafusion]

2024-05-13 Thread via GitHub
ozankabak opened a new pull request, #10483: URL: https://github.com/apache/datafusion/pull/10483 ## Which issue does this PR close? Quick follow-on to #10404. ## Rationale for this change Now that we migrated to a consistent options syntax for external tables, we should

Re: [PR] Add cast array test to sqllogictest [datafusion]

2024-05-13 Thread via GitHub
jonahgao commented on PR #10474: URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107415886 > @jonahgao perhaps you would like to try merging this PR as a test that we have all the permissions setup correctly ? Sure. -- This is an automated message from the Apache

Re: [PR] Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require quotations for simple namespaced keys like `foo.bar` [datafusion]

2024-05-13 Thread via GitHub
ozankabak commented on code in PR #10483: URL: https://github.com/apache/datafusion/pull/10483#discussion_r1598371311 ## datafusion/sql/src/parser.rs: ## @@ -462,7 +462,18 @@ impl<'a> DFParser<'a> { pub fn parse_option_key(&mut self) -> Result { let next_token = se

Re: [PR] Add cast array test to sqllogictest [datafusion]

2024-05-13 Thread via GitHub
jonahgao merged PR #10474: URL: https://github.com/apache/datafusion/pull/10474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] bug: `CAST()` causes internal error [datafusion]

2024-05-13 Thread via GitHub
jonahgao closed issue #10464: bug: `CAST()` causes internal error URL: https://github.com/apache/datafusion/issues/10464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Add cast array test to sqllogictest [datafusion]

2024-05-13 Thread via GitHub
jonahgao commented on PR #10474: URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107424445 Thanks @viirya and thank you @alamb for letting me experience this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: allow `array_slice` to take an optional stride parameter [datafusion]

2024-05-13 Thread via GitHub
jayzhan211 commented on code in PR #10469: URL: https://github.com/apache/datafusion/pull/10469#discussion_r1598377661 ## datafusion/functions-array/src/macros.rs: ## @@ -106,4 +105,26 @@ macro_rules! make_udf_function { } } }; +// This pattern doe

Re: [PR] feat: allow `array_slice` to take an optional stride parameter [datafusion]

2024-05-13 Thread via GitHub
jayzhan211 commented on code in PR #10469: URL: https://github.com/apache/datafusion/pull/10469#discussion_r1598379495 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -581,7 +581,7 @@ async fn roundtrip_expr_api() -> Result<()> { make_array(vec![lit

Re: [PR] Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require quotations for simple namespaced keys like `foo.bar` [datafusion]

2024-05-13 Thread via GitHub
ozankabak commented on PR #10483: URL: https://github.com/apache/datafusion/pull/10483#issuecomment-2107445623 @berkaysynnada PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-13 Thread via GitHub
andygrove commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2107449703 Thanks @vidyasankarv. I plan on carefully reviewing this later today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require quotations for simple namespaced keys like `foo.bar` [datafusion]

2024-05-13 Thread via GitHub
berkaysynnada commented on code in PR #10483: URL: https://github.com/apache/datafusion/pull/10483#discussion_r1598395560 ## datafusion/sqllogictest/test_files/create_external_table.slt: ## @@ -201,7 +201,20 @@ CREATE EXTERNAL TABLE IF NOT EXISTS region ( r_name VARCHAR

Re: [PR] Implement `From>` for `LogicalPlanBuilder` [datafusion]

2024-05-13 Thread via GitHub
AbrarNitk commented on PR #10466: URL: https://github.com/apache/datafusion/pull/10466#issuecomment-2107454163 Hey @alamb, I am glad you have checked PR. I am new to arrow and `datafusion` and `arrow`, will keep learning and also keep sending the PRs :). Thank you 🙏 -- This is an automa

Re: [PR] Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require quotations for simple namespaced keys like `foo.bar` [datafusion]

2024-05-13 Thread via GitHub
berkaysynnada commented on PR #10483: URL: https://github.com/apache/datafusion/pull/10483#issuecomment-2107470174 This PR addresses the last remaining gap, ensuring that UX achieves the desired balance of consistency and flexibility. -- This is an automated message from the Apache Git Se

[PR] Move `Count` to `functions-aggregate` [datafusion]

2024-05-13 Thread via GitHub
jayzhan211 opened a new pull request, #10484: URL: https://github.com/apache/datafusion/pull/10484 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] Implement `From>` for `LogicalPlanBuilder` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10466: URL: https://github.com/apache/datafusion/pull/10466#issuecomment-2107515505 > Hey @alamb, I am glad you have checked PR. I am new to `datafusion` and `arrow`, will keep learning and also keep sending the PRs :). Thank you 🙏 Thank you and welcome to the c

Re: [PR] Add `Expr::try_as_col`, deprecate `Expr::try_into_col` (speed up optimizer) [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10448: URL: https://github.com/apache/datafusion/pull/10448#issuecomment-2107536096 > Most places seem to still be cloning it, but I guess baby steps 😅 Yes indeed -- baby steps. Thank you for the review @tustvold Another thing that would likely help non t

Re: [PR] Add `Expr::try_as_col`, deprecate `Expr::try_into_col` (speed up optimizer) [datafusion]

2024-05-13 Thread via GitHub
alamb merged PR #10448: URL: https://github.com/apache/datafusion/pull/10448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add `Expr::try_as_col`, deprecate `Expr::try_into_col` (speed up optimizer) [datafusion]

2024-05-13 Thread via GitHub
alamb commented on code in PR #10448: URL: https://github.com/apache/datafusion/pull/10448#discussion_r1598451810 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -369,8 +369,18 @@ impl LogicalPlan { // The join keys in using-join must be columns.

Re: [PR] feat: Add logging to explain reasons for Comet not being able to run a query stage natively [datafusion-comet]

2024-05-13 Thread via GitHub
andygrove commented on PR #397: URL: https://github.com/apache/datafusion-comet/pull/397#issuecomment-2107544060 @viirya @parthchandra This PR is ready for another review. I simplified the logic and removed the check to see if the top level operator is native or not. Instead, if we have rec

Re: [PR] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster planning) [datafusion]

2024-05-13 Thread via GitHub
alamb commented on code in PR #10431: URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598446148 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -39,102 +41,147 @@ impl EliminateCrossJoin { } } -/// Attempt to reorder join to eliminate cross j

Re: [PR] Implement `From>` for `LogicalPlanBuilder` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on code in PR #10466: URL: https://github.com/apache/datafusion/pull/10466#discussion_r1598470896 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1138,6 +1139,31 @@ impl LogicalPlanBuilder { )?)) } } + +/// Converts a `Arc` into `LogicalPlan

Re: [PR] Implement `From>` for `LogicalPlanBuilder` [datafusion]

2024-05-13 Thread via GitHub
alamb merged PR #10466: URL: https://github.com/apache/datafusion/pull/10466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Implement `LogicalPlanBuilder::from` for `Arc` [datafusion]

2024-05-13 Thread via GitHub
alamb closed issue #10465: Implement `LogicalPlanBuilder::from` for `Arc` URL: https://github.com/apache/datafusion/issues/10465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[I] Convert internal representation of LogicalPlanBuilder from `LogicalPlan` to `Arc` [datafusion]

2024-05-13 Thread via GitHub
alamb opened a new issue, #10485: URL: https://github.com/apache/datafusion/issues/10485 Great idea @ClSlaid Thanks to @AbrarNitk we have a first version of `LogicalPlanBuilder::from(arc_input)` in https://github.com/apache/datafusion/pull/10466 🙏 I think we sh

Re: [I] Convert internal representation of LogicalPlanBuilder from `LogicalPlan` to `Arc` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10485: URL: https://github.com/apache/datafusion/issues/10485#issuecomment-2107578844 I think this is a good first issue as it is pretty clear what the ask is -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Implement `LogicalPlanBuilder::from` for `Arc` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10465: URL: https://github.com/apache/datafusion/issues/10465#issuecomment-2107579559 Filed https://github.com/apache/datafusion/issues/10485 to track the internal representation change -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] Incorrect conversion of pyarrow interval value to datafusion literal [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on issue #665: URL: https://github.com/apache/datafusion-python/issues/665#issuecomment-2107580388 TODO: If the PR https://github.com/apache/datafusion-python/pull/666 merges in before this issues is corrected, the following examples in the `examples/tpch` folder will n

Re: [PR] Add examples from TPC-H [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on code in PR #666: URL: https://github.com/apache/datafusion-python/pull/666#discussion_r1598479222 ## examples/tpch/q01_pricing_summary_report.py: ## @@ -0,0 +1,90 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licen

Re: [PR] Add examples from TPC-H [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on code in PR #666: URL: https://github.com/apache/datafusion-python/pull/666#discussion_r1598479642 ## examples/tpch/q05_local_supplier_volume.py: ## @@ -0,0 +1,102 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licen

Re: [PR] Add examples from TPC-H [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on code in PR #666: URL: https://github.com/apache/datafusion-python/pull/666#discussion_r1598480165 ## examples/tpch/README.md: ## @@ -0,0 +1,57 @@ + + +# DataFusion Python Examples for TPC-H + +These examples reproduce the problems listed in the Transaction

Re: [PR] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1598480059 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] Add examples from TPC-H [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on PR #666: URL: https://github.com/apache/datafusion-python/pull/666#issuecomment-2107585400 I've added to the main readme in the examples folder, so I think this PR is good to go pending review. -- This is an automated message from the Apache Git Service. To respond

Re: [I] Convert internal representation of LogicalPlanBuilder from `LogicalPlan` to `Arc` [datafusion]

2024-05-13 Thread via GitHub
iiiancampbell commented on issue #10485: URL: https://github.com/apache/datafusion/issues/10485#issuecomment-2107618781 Hi @alamb , would be keen to contribue to this as my first issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Convert internal representation of LogicalPlanBuilder from `LogicalPlan` to `Arc` [datafusion]

2024-05-13 Thread via GitHub
iiiancampbell commented on issue #10485: URL: https://github.com/apache/datafusion/issues/10485#issuecomment-2107634998 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] Default window frames to not match PostgreSQL [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer opened a new issue, #688: URL: https://github.com/apache/datafusion-python/issues/688 **Describe the bug** When no window frame is specified in the python implementation, we default to unbounded preceeding to current row. If we are to follow [PostgreSQL implementation ](https:

Re: [I] Default window frames to not match PostgreSQL [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on issue #688: URL: https://github.com/apache/datafusion-python/issues/688#issuecomment-2107643363 This seems like a very easy fix. I think we just match on order_by.is_some() and pick the appropriate default. To close, I think at a minimum we would want (1) a unit test

[I] Support spark base64 function [datafusion-comet]

2024-05-13 Thread via GitHub
leoluan2009 opened a new issue, #419: URL: https://github.com/apache/datafusion-comet/issues/419 ### What is the problem the feature request solves? Support spark base64 function ### Describe the potential solution _No response_ ### Additional context _No re

Re: [PR] Add examples from TPC-H [datafusion-python]

2024-05-13 Thread via GitHub
andygrove commented on code in PR #666: URL: https://github.com/apache/datafusion-python/pull/666#discussion_r1598526974 ## examples/tpch/convert_data_to_parquet.py: ## @@ -0,0 +1,142 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[PR] Support spark base64 function [datafusion-comet]

2024-05-13 Thread via GitHub
leoluan2009 opened a new pull request, #420: URL: https://github.com/apache/datafusion-comet/pull/420 ## Which issue does this PR close? Closes #419 . ## Rationale for this change ## What changes are included in this PR? ## How are these cha

Re: [PR] Add examples from TPC-H [datafusion-python]

2024-05-13 Thread via GitHub
andygrove merged PR #666: URL: https://github.com/apache/datafusion-python/pull/666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] [DISCUSSION] We need a Hero for datafusion-python [datafusion-python]

2024-05-13 Thread via GitHub
andygrove closed issue #440: [DISCUSSION] We need a Hero for datafusion-python URL: https://github.com/apache/datafusion-python/issues/440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Support spark base64 function [datafusion-comet]

2024-05-13 Thread via GitHub
leoluan2009 commented on PR #420: URL: https://github.com/apache/datafusion-comet/pull/420#issuecomment-2107656632 @viirya Help to start CI, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Default window frames to not match PostgreSQL [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on issue #688: URL: https://github.com/apache/datafusion-python/issues/688#issuecomment-2107665680 Based on recommendation on discord, we may want to use `WindowFrame::new()` which has the logic already for checking if order_by exists. -- This is an automated message

Re: [PR] Minor: Simplify conjunction and disjunction, improve docs [datafusion]

2024-05-13 Thread via GitHub
jayzhan211 commented on code in PR #10446: URL: https://github.com/apache/datafusion/pull/10446#discussion_r1598536147 ## datafusion/expr/src/utils.rs: ## @@ -1107,20 +1107,49 @@ fn split_binary_impl<'a>( /// assert_eq!(conjunction(split), Some(expr)); /// ``` pub fn conjunct

Re: [PR] feat: allow `array_slice` to take an optional stride parameter [datafusion]

2024-05-13 Thread via GitHub
jonahgao commented on code in PR #10469: URL: https://github.com/apache/datafusion/pull/10469#discussion_r1598536574 ## datafusion/functions-array/src/macros.rs: ## @@ -106,4 +105,26 @@ macro_rules! make_udf_function { } } }; +// This pattern does

[I] Query using ARRAY_AGG(DISTINCT) causes panic [datafusion]

2024-05-13 Thread via GitHub
bellwether-softworks opened a new issue, #10486: URL: https://github.com/apache/datafusion/issues/10486 ### Describe the bug Beginning in v37.0.0, a previously-working query is found to result in a panic: ``` panicked at /Users/username/.cargo/registry/src/index.crates.io-6

[PR] Updated builder.rs [datafusion]

2024-05-13 Thread via GitHub
iiiancampbell opened a new pull request, #10487: URL: https://github.com/apache/datafusion/pull/10487 Converted internal representation of LogicalPlanBuilder from LogicalPlan to Arc #10485 ## Which issue does this PR close? Closes #10485 . ## Are these changes te

Re: [I] Query using ARRAY_AGG(DISTINCT) causes panic [datafusion]

2024-05-13 Thread via GitHub
jayzhan211 commented on issue #10486: URL: https://github.com/apache/datafusion/issues/10486#issuecomment-2107714207 I added the assertion because I don't know if there is any case that has len > 1. After moving the assertion, I think it should work as usual. It would be nice if you h

Re: [PR] Add cast array test to sqllogictest [datafusion]

2024-05-13 Thread via GitHub
viirya commented on PR #10474: URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107725114 Thank you @alamb @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Query using ARRAY_AGG(DISTINCT) causes panic [datafusion]

2024-05-13 Thread via GitHub
bellwether-softworks commented on issue #10486: URL: https://github.com/apache/datafusion/issues/10486#issuecomment-2107727362 @jayzhan211 I appreciate your concern regarding the complex example case; I attempted to create a simpler contrived example, but was unable to trigger the panic doi

Re: [I] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-13 Thread via GitHub
viirya closed issue #417: chore: Rename some columnar shuffle configs for code consistently URL: https://github.com/apache/datafusion-comet/issues/417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-13 Thread via GitHub
viirya commented on PR #418: URL: https://github.com/apache/datafusion-comet/pull/418#issuecomment-2107737430 Thanks @leoluan2009 @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-13 Thread via GitHub
viirya merged PR #418: URL: https://github.com/apache/datafusion-comet/pull/418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Query using ARRAY_AGG(DISTINCT) causes panic [datafusion]

2024-05-13 Thread via GitHub
jayzhan211 commented on issue #10486: URL: https://github.com/apache/datafusion/issues/10486#issuecomment-2107751222 > @jayzhan211 I appreciate your concern regarding the complex example case; I attempted to create a simpler contrived example, but was unable to trigger the panic doing so. I

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-13 Thread via GitHub
comphead commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2107801927 @viirya @alamb can I get a review on this PR please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require quotations for simple namespaced keys like `foo.bar` [datafusion]

2024-05-13 Thread via GitHub
comphead commented on code in PR #10483: URL: https://github.com/apache/datafusion/pull/10483#discussion_r1598606826 ## datafusion/sql/src/parser.rs: ## @@ -462,7 +462,18 @@ impl<'a> DFParser<'a> { pub fn parse_option_key(&mut self) -> Result { let next_token = sel

Re: [PR] Minor: Simplify conjunction and disjunction, improve docs [datafusion]

2024-05-13 Thread via GitHub
comphead merged PR #10446: URL: https://github.com/apache/datafusion/pull/10446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: Improve documentation for `catalog.has_header` config option [datafusion]

2024-05-13 Thread via GitHub
comphead merged PR #10452: URL: https://github.com/apache/datafusion/pull/10452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster planning) [datafusion]

2024-05-13 Thread via GitHub
comphead commented on code in PR #10431: URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598621244 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -39,102 +41,147 @@ impl EliminateCrossJoin { } } -/// Attempt to reorder join to eliminate cros

Re: [PR] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster planning) [datafusion]

2024-05-13 Thread via GitHub
comphead commented on code in PR #10431: URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598631511 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -144,49 +191,89 @@ impl OptimizerRule for EliminateCrossJoin { } } +fn rewrite_children( +o

Re: [PR] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster planning) [datafusion]

2024-05-13 Thread via GitHub
comphead commented on code in PR #10431: URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598634008 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -39,102 +41,147 @@ impl EliminateCrossJoin { } } -/// Attempt to reorder join to eliminate cros

Re: [PR] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster planning) [datafusion]

2024-05-13 Thread via GitHub
comphead commented on code in PR #10431: URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598634408 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -39,102 +41,147 @@ impl EliminateCrossJoin { } } -/// Attempt to reorder join to eliminate cros

Re: [PR] Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster planning) [datafusion]

2024-05-13 Thread via GitHub
comphead commented on code in PR #10431: URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598637244 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -39,102 +41,147 @@ impl EliminateCrossJoin { } } -/// Attempt to reorder join to eliminate cros

Re: [PR] feat: allow `array_slice` to take an optional stride parameter [datafusion]

2024-05-13 Thread via GitHub
Michael-J-Ward commented on PR #10469: URL: https://github.com/apache/datafusion/pull/10469#issuecomment-2107941566 @jayzhan211, Should I follow this as a template for making UDF arguments optional? -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Minor: Simplify conjunction and disjunction, improve docs [datafusion]

2024-05-13 Thread via GitHub
alamb commented on code in PR #10446: URL: https://github.com/apache/datafusion/pull/10446#discussion_r1598656772 ## datafusion/expr/src/utils.rs: ## @@ -1107,20 +1107,49 @@ fn split_binary_impl<'a>( /// assert_eq!(conjunction(split), Some(expr)); /// ``` pub fn conjunction(f

Re: [I] Default window frames to not match PostgreSQL [datafusion-python]

2024-05-13 Thread via GitHub
timsaucer commented on issue #688: URL: https://github.com/apache/datafusion-python/issues/688#issuecomment-2107973643 From @Michael-J-Ward > Doing a little archaelogy on that: > > This is the PR where window_frame switched from None to WindowFrame::new(order_by.is_some());

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2107980838 I will review this today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] [DISCUSSION] We need a Hero for datafusion-python [datafusion-python]

2024-05-13 Thread via GitHub
alamb commented on issue #440: URL: https://github.com/apache/datafusion-python/issues/440#issuecomment-2107989366 I think github got a little excited about closing this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-13 Thread via GitHub
andygrove commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1598662597 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1444,13 +1483,136 @@ fn parse_str_to_time_only_timestamp(value: &str) -> CometResult> { Ok(S

[I] [DISCUSSION] We need a Hero for datafusion-python [datafusion-python]

2024-05-13 Thread via GitHub
alamb opened a new issue, #440: URL: https://github.com/apache/datafusion-python/issues/440 ## What this project could be I think this project needs someone who wants to make a world class python dataframe library and user experience take the helm. I will argue why I think this is a

Re: [I] Convert internal representation of LogicalPlanBuilder from `LogicalPlan` to `Arc` [datafusion]

2024-05-13 Thread via GitHub
alamb commented on issue #10485: URL: https://github.com/apache/datafusion/issues/10485#issuecomment-2107993007 Awesome -- thank you @iiiancampbell 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Converted internal representation of LogicalPlanBuilder from LogicalPlan to Arc #10485 [datafusion]

2024-05-13 Thread via GitHub
alamb commented on PR #10487: URL: https://github.com/apache/datafusion/pull/10487#issuecomment-2107995895 Thanks @iiiancampbell ! I triggered the CI to start -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-13 Thread via GitHub
viirya commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2108010944 I'll take another look today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-13 Thread via GitHub
andygrove commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1598670410 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -563,9 +563,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper

  1   2   3   >