Re: [PR] chore: Make parquet reader options Comet options instead of Hadoop options [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on code in PR #968: URL: https://github.com/apache/datafusion-comet/pull/968#discussion_r1776037634 ## common/src/test/java/org/apache/comet/parquet/TestFileReader.java: ## @@ -615,12 +617,12 @@ public void testColumnIndexReadWrite() throws Exception { @Te

Re: [PR] chore: Make parquet reader options Comet options instead of Hadoop options [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on code in PR #968: URL: https://github.com/apache/datafusion-comet/pull/968#discussion_r1776038555 ## common/src/main/java/org/apache/comet/parquet/ReadOptions.java: ## @@ -173,14 +147,24 @@ public Builder(Configuration conf) { this.conf = conf;

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
alamb commented on code in PR #12593: URL: https://github.com/apache/datafusion/pull/12593#discussion_r1776029106 ## datafusion/core/src/datasource/physical_plan/parquet/reader.rs: ## @@ -57,9 +58,49 @@ pub trait ParquetFileReaderFactory: Debug + Send + Sync + 'static {

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1776203466 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1776419786 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; use hashbr

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1776419786 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; use hashbr

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
progval commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375108812 I agree, it's confusing. I actually started from [advanced_parquet_index.rs](https://github.com/apache/datafusion/blob/91c8a47f2001b29bb4fdf6f41e18a65bbf0752de/datafusion-examp

Re: [PR] Improve SanityChecker error message [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12595: URL: https://github.com/apache/datafusion/pull/12595#issuecomment-2375104664 Thank you for the review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2375106042 It is great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Improve performance of `trim` for string view (10%) [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12395: URL: https://github.com/apache/datafusion/pull/12395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve performance of `trim` for string view (10%) [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2375239941 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Improve performance of *TRIM functions for StringViewArray [datafusion]

2024-09-25 Thread via GitHub
alamb closed issue #12387: Improve performance of *TRIM functions for StringViewArray URL: https://github.com/apache/datafusion/issues/12387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2375243686 Marking as draft as I don't think this is waiting on review and I am trying to keep the review backlog under control -- This is an automated message from the Apache Git Service. To r

Re: [I] Automatically generate function documentation from comments / code [datafusion]

2024-09-25 Thread via GitHub
Omega359 commented on issue #12432: URL: https://github.com/apache/datafusion/issues/12432#issuecomment-2375243708 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Improve SanityChecker error message [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12595: URL: https://github.com/apache/datafusion/pull/12595#issuecomment-2375104297 > lgtm thanks @alamb wondering why no tests are failing 🤔 Its because this error only happens due to bugs (like https://github.com/apache/datafusion/issues/12446) so in theory it

Re: [PR] Improve SanityChecker error message [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12595: URL: https://github.com/apache/datafusion/pull/12595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12332: URL: https://github.com/apache/datafusion/pull/12332#issuecomment-2375250760 I took the liberty of merging up from main to resolve a conflict -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Rename aggregation modules, GroupColumn [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on PR #12619: URL: https://github.com/apache/datafusion/pull/12619#issuecomment-2375455499 Since the change is not released, I think there is no API change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] docs: improve the documentation for Aggregate code [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12617: URL: https://github.com/apache/datafusion/pull/12617#discussion_r1776115903 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -36,7 +36,7 @@ use datafusion_physical_expr::binary_map::OutputType; use hashbr

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776133521 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

[PR] modify register table functions args [datafusion]

2024-09-25 Thread via GitHub
JasonLi-cn opened a new pull request, #12630: URL: https://github.com/apache/datafusion/pull/12630 ## Which issue does this PR close? Closes #. ## Rationale for this change Changing `&str` into `impl Into` to be consistent with the `register_table` function.

[PR] Compare schema as logically equivalent to workaround disappearing metadata [datafusion]

2024-09-25 Thread via GitHub
itsjunetime opened a new pull request, #12631: URL: https://github.com/apache/datafusion/pull/12631 ## Which issue does this PR close? This works around some problems noted in #12560. It is not a full fix, but just unblocks some other work for me. ## Rationale for this change

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776454559 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-25 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776481456 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single val

Re: [PR] chore: Update benchmarks results based on 0.3.0-rc1 [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove merged PR #969: URL: https://github.com/apache/datafusion-comet/pull/969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

[PR] Minor: improve documentation to StringView trim [datafusion]

2024-09-25 Thread via GitHub
alamb opened a new pull request, #12629: URL: https://github.com/apache/datafusion/pull/12629 ## Which issue does this PR close? Follow on to https://github.com/apache/datafusion/pull/12395 from @Rachelint ## Rationale for this change Part of reviewing https://github

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
progval commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375197430 I remembered. The issue is that in this code: https://github.com/apache/arrow-rs/blob/62825b27e98e6719cb66258535c75c7490ddba44/parquet/src/arrow/async_reader/mod.rs#L212-L228

[PR] feat: make register_csv accept a list of paths [datafusion-python]

2024-09-25 Thread via GitHub
mesejo opened a new pull request, #883: URL: https://github.com/apache/datafusion-python/pull/883 # Which issue does this PR close? N/A # Rationale for this change The method `read_csv` accepts a list of paths as input. It seems fitting that `register_csv` does the same. Moreove

Re: [I] Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-09-25 Thread via GitHub
justahuman1 commented on issue #325: URL: https://github.com/apache/datafusion-comet/issues/325#issuecomment-2375536003 Hi @sujithjay, let me know if you are still working on this. I am happy to continue #615 and add you as a co-author. Let me know, thank you -- This is an automated mess

Re: [I] Different semantics of casting from int64 to timestamp between Comet and Spark [datafusion-comet]

2024-09-25 Thread via GitHub
justahuman1 commented on issue #146: URL: https://github.com/apache/datafusion-comet/issues/146#issuecomment-2375538517 I can take this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Improve performance of `trim` for string view (10%) [datafusion]

2024-09-25 Thread via GitHub
alamb commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1775902607 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::(fun

Re: [PR] Upgrade dependencies [datafusion-ballista]

2024-09-25 Thread via GitHub
andygrove commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2375151146 Would it make sense just to upgrade to DF 41 in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Replace `OnceLock` with `LazyLock` [datafusion]

2024-09-25 Thread via GitHub
Omega359 commented on issue #11687: URL: https://github.com/apache/datafusion/issues/11687#issuecomment-2375213450 Didn't LazyLock become stable in 1.80.0 ? https://blog.rust-lang.org/2024/07/25/Rust-1.80.0.html I just tried to use it in main and it wouldn't compile... -- This is a

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
progval commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375216464 To give an idea of the difference. Metrics with the new `load_metadata` cached: > time_elapsed_opening=272.811692949s, time_elapsed_processing=118.675115374s, time_elapsed_sc

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2375450135 I wonder if we could change the "automatically load page index if needed" to "error if page index is needed but it is not loaded" 🤔 That might be a less surprising behavior -- This

Re: [PR] Rename aggregation modules, GroupColumn [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 merged PR #12619: URL: https://github.com/apache/datafusion/pull/12619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] implement `generate_series` function [datafusion]

2024-09-25 Thread via GitHub
Weijun-H commented on issue #753: URL: https://github.com/apache/datafusion/issues/753#issuecomment-2373507496 Thanks @Abdullahsab3, this issue is completed via #8140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] implement `generate_series` function [datafusion]

2024-09-25 Thread via GitHub
Weijun-H closed issue #753: implement `generate_series` function URL: https://github.com/apache/datafusion/issues/753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Refactor to support recursive unnest in physical plan [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #11577: URL: https://github.com/apache/datafusion/pull/11577#discussion_r1774864131 ## datafusion/sql/src/utils.rs: ## @@ -309,6 +315,269 @@ pub(crate) fn transform_bottom_unnests( .collect::>()) } +/* +This is only usedful when use

Re: [PR] Disallow duplicated qualified field names [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12608: URL: https://github.com/apache/datafusion/pull/12608#discussion_r1774864320 ## datafusion/sql/src/planner.rs: ## @@ -197,9 +197,9 @@ impl PlannerContext { /// extends the FROM schema, returning the existing one, if any pub f

Re: [PR] Disallow duplicated qualified field names [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12608: URL: https://github.com/apache/datafusion/pull/12608#discussion_r1774864320 ## datafusion/sql/src/planner.rs: ## @@ -197,9 +197,9 @@ impl PlannerContext { /// extends the FROM schema, returning the existing one, if any pub f

Re: [PR] Refactor to support recursive unnest in physical plan [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #11577: URL: https://github.com/apache/datafusion/pull/11577#discussion_r1774868527 ## datafusion/sql/src/utils.rs: ## @@ -309,6 +315,269 @@ pub(crate) fn transform_bottom_unnests( .collect::>()) } +/* +This is only usedful when use

Re: [I] suggest add synax `select from generate_series()` [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 commented on issue #10069: URL: https://github.com/apache/datafusion/issues/10069#issuecomment-2373531351 I think switching `generate_series` to a UDTF will break the API. I think in most cases it's an acceptable API change. However there is (at least for me) 1 particular use c

Re: [PR] fix: Panic/correctness issue in variance GroupsAccumulator [datafusion]

2024-09-25 Thread via GitHub
jayzhan211 commented on code in PR #12615: URL: https://github.com/apache/datafusion/pull/12615#discussion_r1774881870 ## datafusion/functions-aggregate/src/variance.rs: ## @@ -470,7 +470,7 @@ impl VarianceGroupsAccumulator { if let StatsType::Sample = self.stats_type

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-09-25 Thread via GitHub
Eason0729 commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2373280775 I just finish it in my [fork](https://github.com/Eason0729/datafusion). It's not meant to be a pull request, just to showcase what I would like to change. Here is

Re: [I] implement `generate_series` function [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 commented on issue #753: URL: https://github.com/apache/datafusion/issues/753#issuecomment-2373483718 I believe `generate_series` is now included in Datafusion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[I] Panic/correctness issue in variance GroupsAccumulator [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt opened a new issue, #12616: URL: https://github.com/apache/datafusion/issues/12616 ### Describe the bug When using the sample variance and there is no values in a group we will do a `-1` on a 0 usize. This leads to wrong results in release mode and panic in debug. #

Re: [I] parquet: Refine `time_elapsed_opening` metric [datafusion]

2024-09-25 Thread via GitHub
alamb closed issue #12584: parquet: Refine `time_elapsed_opening` metric URL: https://github.com/apache/datafusion/issues/12584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] parquet: Add finer metrics on operations covered by `time_elapsed_opening` [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12585: URL: https://github.com/apache/datafusion/pull/12585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Panic/correctness issue in variance GroupsAccumulator [datafusion]

2024-09-25 Thread via GitHub
alamb closed issue #12616: Panic/correctness issue in variance GroupsAccumulator URL: https://github.com/apache/datafusion/issues/12616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] fix: Panic/correctness issue in variance GroupsAccumulator [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12615: URL: https://github.com/apache/datafusion/pull/12615#issuecomment-2373688683 Thanks @eejbyfeldt and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: Panic/correctness issue in variance GroupsAccumulator [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12615: URL: https://github.com/apache/datafusion/pull/12615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] parquet: Add finer metrics on operations covered by `time_elapsed_opening` [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12585: URL: https://github.com/apache/datafusion/pull/12585#issuecomment-2373687122 Thanks again @progval -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Refactor to support recursive unnest in physical plan [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #11577: URL: https://github.com/apache/datafusion/pull/11577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `Unnest` query does not allow one expression being referenced multiple times [datafusion]

2024-09-25 Thread via GitHub
alamb closed issue #11198: `Unnest` query does not allow one expression being referenced multiple times URL: https://github.com/apache/datafusion/issues/11198 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Recursive `unnest` misbehave if used together with another unnest on different column [datafusion]

2024-09-25 Thread via GitHub
alamb closed issue #11689: Recursive `unnest` misbehave if used together with another unnest on different column URL: https://github.com/apache/datafusion/issues/11689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Refactor to support recursive unnest in physical plan [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #11577: URL: https://github.com/apache/datafusion/pull/11577#issuecomment-2373691899 I think we can handle the follow ons / comments above as a follow on PR. Let's merge this one to keep things flowing. @duongcongtoai let me know if you plan to work on the sugge

Re: [PR] implement kurtosis udaf [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12613: URL: https://github.com/apache/datafusion/pull/12613#issuecomment-2373694964 Thank you @jatin510 . As @Weijun-H says, is there any chance you can retarget this PR to https://github.com/datafusion-contrib/datafusion-functions-extra instead? -- This is

Re: [PR] Use original value when comparing with dictionary column in unparser [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12610: URL: https://github.com/apache/datafusion/pull/12610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-25 Thread via GitHub
andygrove commented on PR #12626: URL: https://github.com/apache/datafusion/pull/12626#issuecomment-2374580198 Also, could you add/update a unit test so that we prevent future regressions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Upgrade dependencies [datafusion-ballista]

2024-09-25 Thread via GitHub
andygrove commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2374588304 > I've just looked at this, I think the never completing queries just try to return too many rows and they take too long. That is because the logical optimizer pushes down

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2374592664 I think the tests I was talking about are described here: https://github.com/apache/datafusion/tree/4a3c09a9316fb8940aeb1c5b9b48bc3b7259d5d4/datafusion/sqllogictest#readme

[I] Add Scala & Python support to benchmarking [datafusion-comet]

2024-09-25 Thread via GitHub
holdenk opened a new issue, #967: URL: https://github.com/apache/datafusion-comet/issues/967 ### What is the problem the feature request solves? For folks considering adopting Arrow Datafusion Comet who have Dataset code in Scala & Python it would probably be useful to be able to run

Re: [PR] Require `Debug` for `TableProvider`, `TableProviderFactory` and `PartitionStream` [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12557: URL: https://github.com/apache/datafusion/pull/12557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12620: URL: https://github.com/apache/datafusion/pull/12620#discussion_r1775591280 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -78,6 +79,38 @@ impl GroupValuesColumn { random_state: Default::default

Re: [PR] Require `Debug` for `PhysicalOptimizerRule` [datafusion]

2024-09-25 Thread via GitHub
AnthonyZhOon commented on PR #12624: URL: https://github.com/apache/datafusion/pull/12624#issuecomment-2374597934 Okay, clippy lint should be fixed with the revert @alamb , odd that rust-analyzer misled me on the types at the time maybe it was an incomplete compile -- This is an automate

Re: [PR] Require `Debug` for `TableProvider`, `TableProviderFactory` and `PartitionStream` [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12557: URL: https://github.com/apache/datafusion/pull/12557#issuecomment-2374596246 Thank you @timsaucer 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Add Scala & Python support to benchmarking [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on issue #967: URL: https://github.com/apache/datafusion-comet/issues/967#issuecomment-2374618914 Thanks @holdenk. That would be great. Since you mentioned Python, you may be interested in https://github.com/apache/datafusion-comet/issues/957 as well. -- This is an au

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2374621671 @alamb I found we indeed don't need the unsafe pointer arithmetic to get the `start_offset`, and I have swithed to a safe way here. Thanks much for suggestion! -- This is an aut

Re: [I] Is it possible to support PyArrow backed UDFs in Comet natively? [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on issue #957: URL: https://github.com/apache/datafusion-comet/issues/957#issuecomment-2374623643 > It seems to me that re-using how spark handle python UDFs would be easier than implementing it from scratch using datafusion. But I'm not 100% sure. Yes, that is th

Re: [I] Improve performance of *TRIM functions for StringViewArray [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on issue #12387: URL: https://github.com/apache/datafusion/issues/12387#issuecomment-2374635119 Finally, after reminding from @alamb , I found we can get the index in safe way, detail can see: https://github.com/Rachelint/arrow-datafusion/blob/f8174626e47d147e90c6715

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-25 Thread via GitHub
andygrove commented on code in PR #12626: URL: https://github.com/apache/datafusion/pull/12626#discussion_r1775552730 ## datafusion/proto/src/logical_plan/mod.rs: ## @@ -490,7 +490,14 @@ impl AsLogicalPlan for LogicalPlanNode { into_logical_plan!(sort.input,

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12620: URL: https://github.com/apache/datafusion/pull/12620#discussion_r1775604205 ## datafusion/physical-plan/src/aggregates/group_values/column_wise.rs: ## @@ -78,6 +79,38 @@ impl GroupValuesColumn { random_state: Default::default

Re: [PR] Disallow duplicated qualified field names [datafusion]

2024-09-25 Thread via GitHub
comphead commented on code in PR #12608: URL: https://github.com/apache/datafusion/pull/12608#discussion_r1775609041 ## datafusion/expr/src/utils.rs: ## @@ -260,7 +252,11 @@ pub fn grouping_set_to_exprlist(group_expr: &[Expr]) -> Result> { } Ok(grouping_set.di

Re: [I] Remove the unsafe codes in `trim` by using `Pattern` [datafusion]

2024-09-25 Thread via GitHub
Rachelint closed issue #12597: Remove the unsafe codes in `trim` by using `Pattern` URL: https://github.com/apache/datafusion/issues/12597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Remove the unsafe codes in `trim` by using `Pattern` [datafusion]

2024-09-25 Thread via GitHub
Rachelint commented on issue #12597: URL: https://github.com/apache/datafusion/issues/12597#issuecomment-2374636629 I found we can get the index in safe way. This issue can be closed. https://github.com/Rachelint/arrow-datafusion/blob/f8174626e47d147e90c6715f5052ccfa269f0493/datafusio

Re: [I] suggest add synax `select from generate_series()` [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 commented on issue #10069: URL: https://github.com/apache/datafusion/issues/10069#issuecomment-2374742174 That might work, but I wonder how it will be inferred and whether it will cause any conflicts. Some examples that can be ambiguous on top of my head: ```sql select ge

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2024-09-25 Thread via GitHub
parthchandra commented on issue #708: URL: https://github.com/apache/datafusion-comet/issues/708#issuecomment-2374753686 Posting a reply in case it helps associate the issue somehow. Anyhow, confirming that I am indeed working on this. In that context, I am initially planning to only ad

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2024-09-25 Thread via GitHub
andygrove commented on issue #708: URL: https://github.com/apache/datafusion-comet/issues/708#issuecomment-2374680096 @parthchandra I know that you are working on this so I tried assigning the issue to you, but your name does not show up for me for some reason. I wondered if pinging you he

Re: [I] Implement Common Subexpression Elimination optimizer rule [datafusion-comet]

2024-09-25 Thread via GitHub
eejbyfeldt commented on issue #942: URL: https://github.com/apache/datafusion-comet/issues/942#issuecomment-2374777297 Spark has it, but not at the plan level. Instead they do it as part of their code generation: https://github.com/apache/spark/blob/v3.5.3/sql/catalyst/src/main/scala/org/a

[I] Benchmarking Configurations for SparkSubmit [datafusion-benchmarks]

2024-09-25 Thread via GitHub
radhikabajaj123 opened a new issue, #14: URL: https://github.com/apache/datafusion-benchmarks/issues/14 Hello, I am using the following configurations for benchmarking: `--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \ --deplo

[PR] Fix sort node deserialization from proto [datafusion]

2024-09-25 Thread via GitHub
palaska opened a new pull request, #12626: URL: https://github.com/apache/datafusion/pull/12626 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Upgrade dependencies [datafusion-ballista]

2024-09-25 Thread via GitHub
palaska commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2374527947 > I have been testing with TPC-H, and many queries work. However, some queries, such as queries 2, 7, and 8, never complete, and I do not see any errors logged. > > I

Re: [PR] Require `Debug` for `PhysicalOptimizerRule` [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12624: URL: https://github.com/apache/datafusion/pull/12624#issuecomment-2374513355 Looks like there are some small CI failures to address @AnthonyZhOon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] suggest add synax `select from generate_series()` [datafusion]

2024-09-25 Thread via GitHub
alamb commented on issue #10069: URL: https://github.com/apache/datafusion/issues/10069#issuecomment-2374529536 FWIW I think we should consider keeping the existing `generate_series` function and add a new user defined table function with the same name -- This is an automated message from

Re: [PR] Add Dictionary String (UTF8) type to String sqllogictests [datafusion]

2024-09-25 Thread via GitHub
alamb commented on PR #12621: URL: https://github.com/apache/datafusion/pull/12621#issuecomment-2374666023 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Add Dictionary String (UTF8) type to String sqllogictests [datafusion]

2024-09-25 Thread via GitHub
alamb merged PR #12621: URL: https://github.com/apache/datafusion/pull/12621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] avro_to_arrow: Support in memory apache_avro Value's [datafusion]

2024-09-25 Thread via GitHub
alamb commented on issue #7690: URL: https://github.com/apache/datafusion/issues/7690#issuecomment-2374710722 > To solve this, I would like to split the AvroArrowArrayReader into a reader and a convertor(Vec to RecordBatch). This sounds like a very reasonable idea to me. I thin

Re: [PR] Added array_any_value function [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 commented on code in PR #12329: URL: https://github.com/apache/datafusion/pull/12329#discussion_r1775650911 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -2131,6 +2132,7 @@ to_unixtime(expression[, ..., format_n]) - [empty](#empty) - [flatten](#flatten)

[PR] Update scalar_functions.md [datafusion]

2024-09-25 Thread via GitHub
Abdullahsab3 opened a new pull request, #12627: URL: https://github.com/apache/datafusion/pull/12627 Remove extra space in reference to list_any_value ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are incl

[I] I'm tempted to add this image to the readme 😅 [datafusion]

2024-09-25 Thread via GitHub
rluvaton opened a new issue, #12628: URL: https://github.com/apache/datafusion/issues/12628 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-25 Thread via GitHub
rluvaton commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2374805365 I see, thank you, how is your experience with debugging this kind of test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] fix: Correct results for grouping sets when columns contain nulls [datafusion]

2024-09-25 Thread via GitHub
eejbyfeldt commented on code in PR #12571: URL: https://github.com/apache/datafusion/pull/12571#discussion_r1775707690 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -3512,6 +3512,18 @@ SELECT MIN(value), MAX(value) FROM integers_with_nulls 1 5 +# grouping_s

  1   2   >