Re: [PR] feat: Optimize for CASE WHEN .. THEN column ELSE null END [datafusion-comet]

2024-07-16 Thread via GitHub
codecov-commenter commented on PR #672: URL: https://github.com/apache/datafusion-comet/pull/672#issuecomment-2230340896 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/672?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] feat: support group by unnest [datafusion]

2024-07-16 Thread via GitHub
jonahgao commented on code in PR #11469: URL: https://github.com/apache/datafusion/pull/11469#discussion_r1678992594 ## datafusion/sql/src/select.rs: ## @@ -354,6 +358,118 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .build() } +fn try_process_aggreg

[PR] Prototype implementing DataFusion functions / operators using arrow-udf liibrary [datafusion]

2024-07-16 Thread via GitHub
xinlifoobar opened a new pull request, #11488: URL: https://github.com/apache/datafusion/pull/11488 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] Remove element's nullability of array_agg function [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on PR #11447: URL: https://github.com/apache/datafusion/pull/11447#issuecomment-2230463798 @eejbyfeldt I see. I agree the motivation here is not really a good reason, but it is my tradeoff about how we can move on for aggregate function. I think the tradeoff h

Re: [PR] feat: support group by unnest [datafusion]

2024-07-16 Thread via GitHub
JasonLi-cn commented on code in PR #11469: URL: https://github.com/apache/datafusion/pull/11469#discussion_r1679085120 ## datafusion/sql/src/select.rs: ## @@ -297,6 +298,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { input: LogicalPlan, select_exprs: Vec,

Re: [PR] Move `GetField` to `ExprPlanner` [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11487: URL: https://github.com/apache/datafusion/pull/11487#discussion_r1679111880 ## datafusion/sql/src/expr/identifier.rs: ## @@ -135,16 +138,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { let nested_name = nested_

Re: [I] Prototype implementing DataFusion functions / operators using `arrow-udf` liibrary [datafusion]

2024-07-16 Thread via GitHub
xinlifoobar commented on issue #11413: URL: https://github.com/apache/datafusion/issues/11413#issuecomment-2230502588 Sorry it takes longer than I expected to make this works end-to-end. From my perspective, Good points: - Provide uniform way to implement functions against re

Re: [PR] Add parser option enable_options_value_normalization [datafusion]

2024-07-16 Thread via GitHub
xinlifoobar commented on PR #11330: URL: https://github.com/apache/datafusion/pull/11330#issuecomment-2230528653 > > Hi again @xinlifoobar. The idea in my mind was something like this: [synnada-ai@341b484](https://github.com/synnada-ai/datafusion-upstream/commit/341b48458c09f25d2a5873fcee2fc

Re: [PR] Initial support for `StringView`, merge changes from `string-view` development branch [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11402: URL: https://github.com/apache/datafusion/pull/11402#issuecomment-2230546227 @thinkharderdev or @avantgardnerio -- would either of you have a few moments to review this PR? We want to merge the work we have so far for StringView into main (that depends on arro

Re: [PR] minor: non-overlapping `repart_time` and `send_time` metrics [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11440: URL: https://github.com/apache/datafusion/pull/11440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] minor: non-overlapping `repart_time` and `send_time` metrics [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11440: URL: https://github.com/apache/datafusion/pull/11440#issuecomment-2230547259 Thanks @korowa -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Move `GetField` to `ExprPlanner` [datafusion]

2024-07-16 Thread via GitHub
dharanad commented on code in PR #11487: URL: https://github.com/apache/datafusion/pull/11487#discussion_r1679209951 ## datafusion/sql/src/expr/identifier.rs: ## @@ -135,16 +138,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { let nested_name = nested_na

Re: [PR] Docs: Document creating new extension APIs [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11425: URL: https://github.com/apache/datafusion/pull/11425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Docs: Document creating new extension APIs [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11425: URL: https://github.com/apache/datafusion/pull/11425#issuecomment-2230645608 Thanks again @ozankabak and @wjones127 If anyone else has thoughts or suggestions we can make another PR -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11403: URL: https://github.com/apache/datafusion/pull/11403#issuecomment-2230646217 Thanks @Omega359 and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] DataFusion repo got 40MB larger [datafusion]

2024-07-16 Thread via GitHub
findepi commented on issue #10422: URL: https://github.com/apache/datafusion/issues/10422#issuecomment-2230671934 if we want to fix this, this will require a force push to `main` branch and also doing something about tags that reference the commit that includes `docs/docs.tgz` file (`39.0.0

Re: [PR] Move `GetField` to `ExprPlanner` [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11487: URL: https://github.com/apache/datafusion/pull/11487#discussion_r1679242010 ## datafusion/sql/src/expr/identifier.rs: ## @@ -135,16 +138,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { let nested_name = nested_

Re: [I] Optimize CASE WHEN for "expression or null" case [datafusion]

2024-07-16 Thread via GitHub
andygrove commented on issue #11484: URL: https://github.com/apache/datafusion/issues/11484#issuecomment-2230755785 I am implementing this in Comet in https://github.com/apache/datafusion-comet/pull/672 and plan on upstreaming to DataFusion soon. The optimization is resulting in a > 10x spe

Re: [I] [Epic] Extract catalog functionality from the core to make it more modular [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on issue #10782: URL: https://github.com/apache/datafusion/issues/10782#issuecomment-2230783653 Current dependencies for TableProvider that actually need instead of the whole SessionState QueryPlanner + ExecutionProps + SessionConfig + RuntimeEnv + FileFormat

Re: [I] Add support for correlated subquery [datafusion]

2024-07-16 Thread via GitHub
aalexandrov commented on issue #6140: URL: https://github.com/apache/datafusion/issues/6140#issuecomment-2230808207 > BTW you can also make a single word comment take and we have a bot that will assign you. Oh, good to know—I will take a look at the available commands next time.

Re: [I] Prototype implementing DataFusion functions / operators using `arrow-udf` liibrary [datafusion]

2024-07-16 Thread via GitHub
xinlifoobar commented on issue #11413: URL: https://github.com/apache/datafusion/issues/11413#issuecomment-2230827097 Btw... the `ColumnarValue` introduced into datafusion 2 years ago. Considering: - `ColumnarValue` are always paired use with `SchemaRef` - RecordBatches are one or more

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1679348352 ## datafusion/functions-array/src/make_array.rs: ## @@ -131,6 +138,77 @@ impl ScalarUDFImpl for MakeArray { } } +#[derive(Debug)] +pub struct MakeArrayS

Re: [I] Add support for `newlines_in_values` to `CsvOptions` [datafusion]

2024-07-16 Thread via GitHub
2010YOUY01 commented on issue #11472: URL: https://github.com/apache/datafusion/issues/11472#issuecomment-2230833594 > I haven't dug into the implementation, but I imagine it becomes harder to find the right split point for multi-threaded reading That is correct, DataFusion parallel C

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2230835502 @parthchandra @kazuyukitanimura @huaxingao @viirya This is ready for review now -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] feat: Optimize for CASE WHEN .. THEN column ELSE null END [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #672: URL: https://github.com/apache/datafusion-comet/pull/672#issuecomment-2230838439 @parthchandra @kazuyukitanimura @huaxingao @viirya This is ready for review now -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1679354355 ## datafusion/functions-array/src/make_array.rs: ## @@ -131,6 +138,77 @@ impl ScalarUDFImpl for MakeArray { } } +#[derive(Debug)] +pub struct MakeArrayS

Re: [PR] WIP: Update the parquet code prune_pages_in_one_row_group to use the `StatisticsExtractor` [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11483: URL: https://github.com/apache/datafusion/pull/11483#discussion_r1679347823 ## datafusion/core/src/datasource/physical_plan/parquet/page_filter.rs: ## @@ -103,30 +96,52 @@ use super::metrics::ParquetFileMetrics; /// /// So we can entirely

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1679372260 ## datafusion/functions-array/src/make_array.rs: ## @@ -131,6 +138,77 @@ impl ScalarUDFImpl for MakeArray { } } +#[derive(Debug)] +pub struct MakeArrayS

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1679386760 ## datafusion/functions-array/src/make_array.rs: ## @@ -131,6 +138,77 @@ impl ScalarUDFImpl for MakeArray { } } +#[derive(Debug)] +pub struct MakeArrayS

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1679399236 ## datafusion/functions-array/src/make_array.rs: ## @@ -131,6 +138,77 @@ impl ScalarUDFImpl for MakeArray { } } +#[derive(Debug)] +pub struct MakeArrayS

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1679399236 ## datafusion/functions-array/src/make_array.rs: ## @@ -131,6 +138,77 @@ impl ScalarUDFImpl for MakeArray { } } +#[derive(Debug)] +pub struct MakeArrayS

[PR] feat: support UDWFs in Substrait [datafusion]

2024-07-16 Thread via GitHub
Blizzara opened a new pull request, #11489: URL: https://github.com/apache/datafusion/pull/11489 ## Which issue does this PR close? Closes #. ## Rationale for this change Previously Substrait consumer would, for window functions, look at: 1. UDAFs 2. built-i

Re: [PR] fix: Spark-4.0 widening type support [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #604: URL: https://github.com/apache/datafusion-comet/pull/604#discussion_r1679419996 ## native/core/src/parquet/read/values.rs: ## @@ -439,14 +466,61 @@ macro_rules! make_int_variant_impl { make_int_variant_impl!(Int8Type, copy_i32_to_i8, 1)

Re: [PR] Update the parquet page pruning code to use the `StatisticsExtractor` [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11483: URL: https://github.com/apache/datafusion/pull/11483#issuecomment-2230928354 @liukun4515 or @Ted-Jiang I wonder if you have time to review this code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] fix: Spark-4.0 widening type support [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #604: URL: https://github.com/apache/datafusion-comet/pull/604#discussion_r1679430275 ## native/core/src/parquet/read/values.rs: ## @@ -285,6 +259,59 @@ impl PlainDecoding for Int32DateType { } } +impl PlainDecoding for Int32TimestampMic

Re: [I] Add support for `newlines_in_values` to `CsvOptions` [datafusion]

2024-07-16 Thread via GitHub
connec commented on issue #11472: URL: https://github.com/apache/datafusion/issues/11472#issuecomment-2230939423 I see, so an approach could be to: 1. Add `newlines_in_values: bool` to [`arrow_csv::reader::Format`](https://docs.rs/arrow-csv/latest/arrow_csv/reader/struct.Format.html).

Re: [I] Add `coop_budget` to `datafusion_common::config::ExecutionOptions` [datafusion]

2024-07-16 Thread via GitHub
Rachelint commented on issue #11451: URL: https://github.com/apache/datafusion/issues/11451#issuecomment-2230941223 > > But I still don't quite understand about `I think this is probably not a big issue if you are setting the partition parallelism to the number` mentioned above... Mind expl

[I] Add has_side_effects to PhysicalExpr [datafusion]

2024-07-16 Thread via GitHub
andygrove opened a new issue, #11490: URL: https://github.com/apache/datafusion/issues/11490 ### Is your feature request related to a problem or challenge? For expressions such as `IF(predicate, true_expr, false_expr)` or `CASE WHEN predicate THEN true_expr ELSE false_expr END`, we ge

Re: [PR] Move `GetField` to `ExprPlanner` [datafusion]

2024-07-16 Thread via GitHub
dharanad commented on code in PR #11487: URL: https://github.com/apache/datafusion/pull/11487#discussion_r1679485647 ## datafusion/sql/src/expr/identifier.rs: ## @@ -135,16 +138,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { let nested_name = nested_na

Re: [PR] Minor: rename `row_groups.rs` to `row_group_filter.rs` [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11481: URL: https://github.com/apache/datafusion/pull/11481#issuecomment-2231016127 Thanks @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Minor: rename `row_groups.rs` to `row_group_filter.rs` [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11481: URL: https://github.com/apache/datafusion/pull/11481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Improve performance for grouping by variable length columns (strings) [datafusion]

2024-07-16 Thread via GitHub
XiangpengHao commented on issue #9403: URL: https://github.com/apache/datafusion/issues/9403#issuecomment-2231018750 Want to share some of my findings here. My approach is very similar to #10976, except that it will emit `StringViewArray` and uses `ArrowBytesViewMap` in #11421 to buil

Re: [PR] Support alternate formats for unparsing `datetime` to `timestamp` and `interval` [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11466: URL: https://github.com/apache/datafusion/pull/11466#discussion_r1679493595 ## datafusion/sql/src/unparser/expr.rs: ## @@ -1108,6 +1102,123 @@ impl Unparser<'_> { } } +fn interval_scalar_to_sql(&self, v: &ScalarValue) ->

Re: [PR] Support alternate formats for unparsing `datetime` to `timestamp` and `interval` [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11466: URL: https://github.com/apache/datafusion/pull/11466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] upgrade sqlparser 0.47 -> 0.48 [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11453: URL: https://github.com/apache/datafusion/pull/11453#issuecomment-2231030080 I plan to file the follow on ticket / description and then merge this later today -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] Followup Support NULL literals in where clause [datafusion]

2024-07-16 Thread via GitHub
xinlifoobar opened a new pull request, #11491: URL: https://github.com/apache/datafusion/pull/11491 ## Which issue does this PR close? Follow up #11266. This is to use `type_coercion` for null literary to avoid hard coding in physical planner. ## Rationale for this chan

Re: [I] Internal Error for an INNER JOIN query (SQLancer) [datafusion]

2024-07-16 Thread via GitHub
xinlifoobar commented on issue #11412: URL: https://github.com/apache/datafusion/issues/11412#issuecomment-2231057944 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Improve performance for grouping by variable length columns (strings) [datafusion]

2024-07-16 Thread via GitHub
XiangpengHao commented on issue #9403: URL: https://github.com/apache/datafusion/issues/9403#issuecomment-2231064706 One crazy idea I've been thinking: if the string view is loaded from dictionary-encoded parquet, then the underlying buffers are unique (i.e., no duplicated values), then the

Re: [PR] chore: Add criterion benchmark for CaseExpr [datafusion]

2024-07-16 Thread via GitHub
andygrove merged PR #11482: URL: https://github.com/apache/datafusion/pull/11482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Plan `LATERAL` subqueries [datafusion]

2024-07-16 Thread via GitHub
jonahgao commented on code in PR #11456: URL: https://github.com/apache/datafusion/pull/11456#discussion_r1679583531 ## datafusion/sql/src/relation/mod.rs: ## @@ -143,4 +147,49 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { Ok(plan) } } + +pub(c

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-16 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1679586344 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1503,12 +1579,39 @@ fn get_buffered_columns( buffered_batch_idx: usize, buffered_indice

[PR] chore: Change suffix on some expressions from Exec to Expr [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove opened a new pull request, #673: URL: https://github.com/apache/datafusion-comet/pull/673 ## Which issue does this PR close? N/A ## Rationale for this change By convention, DataFusion uses a suffix of `Expr` for implementatins of `PhysicalExpr`,

Re: [PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11444: URL: https://github.com/apache/datafusion/pull/11444#discussion_r1679637196 ## datafusion/common/src/config.rs: ## @@ -454,6 +465,87 @@ config_namespace! { } } +#[cfg(feature = "parquet")] Review Comment: Can you please move thi

Re: [I] Add has_side_effects to PhysicalExpr [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #11490: URL: https://github.com/apache/datafusion/issues/11490#issuecomment-2231284784 Maybe we could consider moving more of the physical expr implementations into `ScalarUDFImpl` user defined functions rather than extending more built in code 🤔 -- This is an a

Re: [I] Add `coop_budget` to `datafusion_common::config::ExecutionOptions` [datafusion]

2024-07-16 Thread via GitHub
thinkharderdev commented on issue #11451: URL: https://github.com/apache/datafusion/issues/11451#issuecomment-2231287150 > Thanks, got it. And as I understand, it seems the reason why executors underutilized is when the cpu work is required, but the IO is in flight and not ready? > >

[I] `SanityCheckPlan` Error during planning: ... does not satisfy parent order requirements: ... [datafusion]

2024-07-16 Thread via GitHub
alamb opened a new issue, #11492: URL: https://github.com/apache/datafusion/issues/11492 ### Describe the bug When run this query I get an error ```sql > SELECT (SELECT cpu from t2 ORDER BY cpu) UNION ALL (SELECT 'bar' as cpu from t2) ORDER BY cpu; SanityCheckPlan

Re: [I] `SanityCheckPlan` Error during planning: ... does not satisfy parent order requirements: ... [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #11492: URL: https://github.com/apache/datafusion/issues/11492#issuecomment-2231296275 When I was debugging this in our environment, I think the bug seems to be that the UnionExec is not correctly computing its output sort order when only one of the inputs has a con

Re: [PR] Minor: fix some new warnings [datafusion]

2024-07-16 Thread via GitHub
comphead commented on code in PR #11467: URL: https://github.com/apache/datafusion/pull/11467#discussion_r1679687518 ## datafusion/sqllogictest/test_files/sort_merge_join.slt: ## @@ -238,16 +238,17 @@ SELECT * FROM t1 FULL JOIN t2 ON t1_id = t2_id 44 d 4 44 x 3 NULL NULL NULL

Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11196: URL: https://github.com/apache/datafusion/pull/11196#issuecomment-2231310445 We found an issue (that I think is exposed by this change, not caused by it) while upgrading InfluxDB 3.0: https://github.com/apache/datafusion/issues/11492 -- This is an automated m

[PR] chore: Fix some regressions with Spark 3.5.1 [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove opened a new pull request, #674: URL: https://github.com/apache/datafusion-comet/pull/674 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/617 ## Rationale for this change ## What changes are included i

[PR] Test + workaround for `SanityCheckPlan` error [datafusion]

2024-07-16 Thread via GitHub
alamb opened a new pull request, #11493: URL: https://github.com/apache/datafusion/pull/11493 ## Which issue does this PR close? This is a workaround + some tests for https://github.com/apache/datafusion/issues/11492 I don't think this should be merged, I am just posting it her

Re: [PR] Test + workaround for `SanityCheckPlan` error [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11493: URL: https://github.com/apache/datafusion/pull/11493#discussion_r1679710719 ## datafusion/sqllogictest/test_files/union.slt: ## @@ -602,3 +605,38 @@ physical_plan 09)--ProjectionExec: expr=[1 as count, MAX(Int64(10))@0 as n] 10)Aggrega

Re: [PR] Test + workaround for `SanityCheckPlan` error [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11493: URL: https://github.com/apache/datafusion/pull/11493#discussion_r1679719549 ## datafusion/core/src/physical_optimizer/sanity_checker.rs: ## @@ -125,6 +127,14 @@ pub fn check_plan_sanity( plan.required_input_ordering().iter(),

Re: [I] Decide whether we want to standardize which `assert_eq!` parameter ("left" or "right") should be used for "expected" value [datafusion]

2024-07-16 Thread via GitHub
findepi commented on issue #11270: URL: https://github.com/apache/datafusion/issues/11270#issuecomment-2231390487 I agree @Omega359 this looks more like RustRover issue. It seems that RR has some limitations under which they are working (https://github.com/intellij-rust/intellij-rust/pull/3

[PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
comphead opened a new pull request, #675: URL: https://github.com/apache/datafusion-comet/pull/675 ## Which issue does this PR close? Closes #643 . ## Rationale for this change Create a Comet dockerfile to create a Spark based image with integrated Comet jar

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679801559 ## kube/Dockerfile: ## @@ -0,0 +1,26 @@ +FROM apache/spark:3.4.2 + +USER root + +# Installing JDK11 as the image comes with JRE +RUN apt update \ +&& apt i

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679802182 ## kube/Dockerfile: ## @@ -0,0 +1,26 @@ +FROM apache/spark:3.4.2 Review Comment: Need to add ASF header -- This is an automated message from the Apache

Re: [PR] chore: Fix some regressions with Spark 3.5.1 [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #674: URL: https://github.com/apache/datafusion-comet/pull/674#issuecomment-2231469119 @parthchandra @kazuyukitanimura This PR is ready for review now and fixes almost all of the ignored tests for Spark 3.5.1. I will follow up with a separate PR for the final two t

Re: [I] Decide whether we want to standardize which `assert_eq!` parameter ("left" or "right") should be used for "expected" value [datafusion]

2024-07-16 Thread via GitHub
Omega359 commented on issue #11270: URL: https://github.com/apache/datafusion/issues/11270#issuecomment-2231561358 That seems similar to the assertJ Java assertion library which I've used in the past. Fluent api's like that can be more verbose but in general I find them a lot easier to use.

Re: [I] LEFT/RIGHT OUTER JOIN failed to detect ambiguous column reference (SQLancer-NoREC) [datafusion]

2024-07-16 Thread via GitHub
aalexandrov commented on issue #11408: URL: https://github.com/apache/datafusion/issues/11408#issuecomment-2231587943 Not sure if this is should be classified as a bug or just as an error message that could be improved. I tested on Postgres, MySQL, and SQL Server and all fail because

Re: [I] Support `to_json` function on `struct` type [datafusion]

2024-07-16 Thread via GitHub
dharanad closed issue #11441: Support `to_json` function on `struct` type URL: https://github.com/apache/datafusion/issues/11441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Support `to_json` function on `struct` type [datafusion]

2024-07-16 Thread via GitHub
dharanad commented on issue #11441: URL: https://github.com/apache/datafusion/issues/11441#issuecomment-2231607181 > I wonder if this would belong in this repo instead: https://github.com/datafusion-contrib/datafusion-functions-json I think `datafusion-functions-json` is the place. Wi

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
viirya commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679935475 ## kube/Dockerfile: ## @@ -0,0 +1,26 @@ +FROM apache/spark:3.4.2 + +USER root + +# Installing JDK11 as the image comes with JRE +RUN apt update \ +&& apt inst

Re: [I] `Unnest` query does not allow one expression being referenced multiple times [datafusion]

2024-07-16 Thread via GitHub
duongcongtoai commented on issue #11198: URL: https://github.com/apache/datafusion/issues/11198#issuecomment-2231643752 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] `Unnest` query does not allow one expression being referenced multiple times [datafusion]

2024-07-16 Thread via GitHub
duongcongtoai commented on issue #11198: URL: https://github.com/apache/datafusion/issues/11198#issuecomment-2231644659 also different error types for the following queries: ``` select unnest(column1) + unnest(column1) from unnest_table; External error: query failed: DataFusion error

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679954544 ## kube/Dockerfile: ## @@ -0,0 +1,26 @@ +FROM apache/spark:3.4.2 + +USER root + +# Installing JDK11 as the image comes with JRE +RUN apt update \ +&& apt i

Re: [PR] chore: Change suffix on some expressions from Exec to Expr [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove merged PR #673: URL: https://github.com/apache/datafusion-comet/pull/673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
comphead commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679965630 ## kube/Dockerfile: ## @@ -0,0 +1,26 @@ +FROM apache/spark:3.4.2 + +USER root + +# Installing JDK11 as the image comes with JRE +RUN apt update \ +&& apt in

[I] Move spill related functions to `spill.rs` [datafusion]

2024-07-16 Thread via GitHub
comphead opened a new issue, #11496: URL: https://github.com/apache/datafusion/issues/11496 maybe eventually the spill related untilities could go into `physical-plan/src/spill.rs` though I see this PR simply follows the existing pattern so I think it is find _Originall

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-16 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1679979879 ## datafusion/physical-plan/src/lib.rs: ## @@ -852,6 +852,30 @@ pub fn spill_record_batches( Ok(writer.num_rows) } +/// Spill the `RecordBatch` to disk as

Re: [PR] Plan `LATERAL` subqueries [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11456: URL: https://github.com/apache/datafusion/pull/11456#issuecomment-2231716055 This PR is very much on my list to review, but I fear I may not get to it today (I have a bunch of PRs waiting upstream in arrow-rs I need to attend to) -- This is an automated messa

Re: [PR] Initial support for `StringView`, merge changes from `string-view` development branch [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11402: URL: https://github.com/apache/datafusion/pull/11402#issuecomment-2231718027 Thank you @thinkharderdev cc @XiangpengHao I will now make a string-view2 branch for your next sequence of PRs -- This is an automated message from the Apache Git Service. To

Re: [PR] Initial support for `StringView`, merge changes from `string-view` development branch [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11402: URL: https://github.com/apache/datafusion/pull/11402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add ArrowBytesViewMap and ArrowBytesViewSet [datafusion]

2024-07-16 Thread via GitHub
alamb closed pull request #11421: Add ArrowBytesViewMap and ArrowBytesViewSet URL: https://github.com/apache/datafusion/pull/11421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] add a config to force using string view in benchmark [datafusion]

2024-07-16 Thread via GitHub
alamb closed pull request #11475: add a config to force using string view in benchmark URL: https://github.com/apache/datafusion/pull/11475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
comphead commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1679993406 ## kube/Dockerfile: ## @@ -0,0 +1,26 @@ +FROM apache/spark:3.4.2 + +USER root + +# Installing JDK11 as the image comes with JRE +RUN apt update \ +&& apt in

Re: [I] [Epic] Implement support for `StringView` in DataFusion [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #10918: URL: https://github.com/apache/datafusion/issues/10918#issuecomment-2231728026 Update: we merged the basic support for StringView in https://github.com/apache/datafusion/pull/11402 I have created a branch to collect any changes that rely on the pre-re

Re: [I] DataFusion repo got 40MB larger [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #10422: URL: https://github.com/apache/datafusion/issues/10422#issuecomment-2231733219 I think the branch protections would need to be updated temporarily too -- I don't think we can force push to main I also think if we force push to main all outstanding PRs

Re: [I] DataFusion repo got 40MB larger [datafusion]

2024-07-16 Thread via GitHub
comphead commented on issue #10422: URL: https://github.com/apache/datafusion/issues/10422#issuecomment-2231739945 I think it was a precommit github hook checking large files, please allow me some time to investigate it -- This is an automated message from the Apache Git Service. To respo

Re: [I] [Epic] Extract catalog functionality from the core to make it more modular [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #10782: URL: https://github.com/apache/datafusion/issues/10782#issuecomment-2231748616 > These 3 has no circular dependencies which mean we can/should move them out of core before moving catalog-common out of core. > PhysicalOptimizer I agree -- I think

Re: [I] [Epic] Extract catalog functionality from the core to make it more modular [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #10782: URL: https://github.com/apache/datafusion/issues/10782#issuecomment-2231749097 Shall we file a ticket / epic to pull physical optimizer out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] feat: support group by unnest [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11469: URL: https://github.com/apache/datafusion/pull/11469#discussion_r1680028460 ## datafusion/sql/src/select.rs: ## @@ -354,6 +358,118 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .build() } +fn try_process_aggregate

[I] This function seems to have some code repetition with function [try_process_unnest](https://github.com/apache/datafusion/blob/f204869ff55bb3e39cf23fc0a34ebd5021e6773f/datafusion/sql/src/select.rs#

2024-07-16 Thread via GitHub
alamb opened a new issue, #11498: URL: https://github.com/apache/datafusion/issues/11498 This function seems to have some code repetition with function [try_process_unnest](https://github.com/apache/datafusion/blob/f204869ff55bb3e39cf23fc0a34ebd5021e6773f/datafusion/sql/src/sel

Re: [PR] feat: support group by unnest [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11469: URL: https://github.com/apache/datafusion/pull/11469#discussion_r1680032211 ## datafusion/sql/src/select.rs: ## @@ -297,6 +298,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { input: LogicalPlan, select_exprs: Vec,

Re: [PR] Replace to_lowercase with to_string in sql example [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11486: URL: https://github.com/apache/datafusion/pull/11486#issuecomment-2231779637 Thank you @lewiszlw and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Minor: Make execute_input_stream Accessible for Any Sinking Operators [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11449: URL: https://github.com/apache/datafusion/pull/11449#issuecomment-2231780732 Let's move the code around in a subsequent PR if we think that is useful -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Replace to_lowercase with to_string in sql example [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11486: URL: https://github.com/apache/datafusion/pull/11486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: switch to using proper Substrait types for IntervalYearMonth and IntervalDayTime [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11471: URL: https://github.com/apache/datafusion/pull/11471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: switch to using proper Substrait types for IntervalYearMonth and IntervalDayTime [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11471: URL: https://github.com/apache/datafusion/pull/11471#issuecomment-2231779975 Thanks again @Blizzara -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Minor: Make execute_input_stream Accessible for Any Sinking Operators [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11449: URL: https://github.com/apache/datafusion/pull/11449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

  1   2   >