Re: [PR] Test for reading read statistics from parquet files without statistics and boolean & struct data type [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10608: URL: https://github.com/apache/datafusion/pull/10608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Test for reading read statistics from parquet files without statistics and boolean & struct data type [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10608: URL: https://github.com/apache/datafusion/pull/10608#issuecomment-2125584421 Since this was test only code I just merged it in to keep things moving -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Extract `Date32` parquet statistics as `Date32Array` rather than `Int32Array` [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1610537274 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -75,6 +75,12 @@ macro_rules! get_statistic { *scale,

Re: [PR] Extract `Date32` parquet statistics as `Date32Array` rather than `Int32Array` [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10593: URL: https://github.com/apache/datafusion/pull/10593#issuecomment-2125592185 This file had some merge conflicts so I took the liberty of merging up from main and addressing the comment https://github.com/apache/datafusion/pull/10593#discussion_r1608835566 in 4

Re: [PR] Extract `Date32` parquet statistics as `Date32Array` rather than `Int32Array` [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1610537274 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -75,6 +75,12 @@ macro_rules! get_statistic { *scale,

[I] Improve performance of extracting statistics from parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb opened a new issue, #10626: URL: https://github.com/apache/datafusion/issues/10626 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10453 @Lordworms added a benchmark for extracting statistics from parquet f

Re: [I] Implement a benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-22 Thread via GitHub
alamb closed issue #10606: Implement a benchmark for extracting arrow statistics from parquet URL: https://github.com/apache/datafusion/issues/10606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] adding benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10610: URL: https://github.com/apache/datafusion/pull/10610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-22 Thread via GitHub
peter-toth commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2125641924 I've rebased the PR on the latest `main`. The first commit is exactly the same as the previous state of this PR was, the second commit changes `TreeNode:apply()` and `TreeNode:vis

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-22 Thread via GitHub
huaxingao commented on code in PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#discussion_r1610574040 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -1212,6 +1212,157 @@ class CometAggregateSuite extends CometTestBase with Adapt

Re: [PR] feat: extend `unnest` to support Struct datatype [datafusion]

2024-05-22 Thread via GitHub
duongcongtoai commented on code in PR #10429: URL: https://github.com/apache/datafusion/pull/10429#discussion_r1610578088 ## datafusion/sqllogictest/test_files/unnest.slt: ## @@ -288,6 +308,18 @@ select unnest(array_remove(column1, 12)) from unnest_table; 5 6 +## unnest stru

[PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-22 Thread via GitHub
westonpace opened a new pull request, #10627: URL: https://github.com/apache/datafusion/pull/10627 ## Which issue does this PR close? Closes #8031 ## Rationale for this change See the motivating issue. ## What changes are included in this PR? The `MinAccumul

Re: [PR] Draft: Add pyi stubs for type hinting [datafusion-python]

2024-05-22 Thread via GitHub
timsaucer closed pull request #709: Draft: Add pyi stubs for type hinting URL: https://github.com/apache/datafusion-python/pull/709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1610577152 ## datafusion-examples/examples/udaf_expr.rs: ## @@ -0,0 +1,31 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [I] PlaceholderRowExec shown when select from union results. [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10613: URL: https://github.com/apache/datafusion/issues/10613#issuecomment-2125679429 > This is like an optimization for empty tables. The result above is when both t3 and t4 are empty. After insert some values, the plan displayed correctly. I believe that i

[I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-22 Thread via GitHub
alamb opened a new issue, #10628: URL: https://github.com/apache/datafusion/issues/10628 ### Is your feature request related to a problem or challenge? The usecase of walking `LogicalPlan` to either analyze the plan or rewrite it has come up multiple times recently Specifically

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10549: URL: https://github.com/apache/datafusion/pull/10549#issuecomment-2125713470 This PR is now ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10549: URL: https://github.com/apache/datafusion/pull/10549#discussion_r1610626756 ## datafusion-examples/examples/parquet_index.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10549: URL: https://github.com/apache/datafusion/pull/10549#discussion_r1610627390 ## datafusion-examples/examples/parquet_index.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add example for building an external secondary index for parquet files [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10549: URL: https://github.com/apache/datafusion/pull/10549#discussion_r1610626756 ## datafusion-examples/examples/parquet_index.rs: ## @@ -0,0 +1,727 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10574: URL: https://github.com/apache/datafusion/pull/10574#issuecomment-2125721583 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10386: URL: https://github.com/apache/datafusion/pull/10386#issuecomment-2125722281 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from the A

Re: [PR] feat: extend `unnest` to support Struct datatype [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10429: URL: https://github.com/apache/datafusion/pull/10429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support unnest for struct data type [datafusion]

2024-05-22 Thread via GitHub
alamb closed issue #10264: Support unnest for struct data type URL: https://github.com/apache/datafusion/issues/10264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-22 Thread via GitHub
alamb merged PR #10573: URL: https://github.com/apache/datafusion/pull/10573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-22 Thread via GitHub
alamb closed issue #10557: Make SQL strings generated from `Expr`s "prettier" URL: https://github.com/apache/datafusion/issues/10557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2125733405 FWI https://github.com/apache/datafusion/pull/10573 is merged! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-22 Thread via GitHub
alamb commented on code in PR #10592: URL: https://github.com/apache/datafusion/pull/10592#discussion_r1610641830 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -624,20 +624,281 @@ async fn test_dates_64_diff_rg_sizes() { .run("date64"); } +// BUG: +// https

Re: [PR] improve monotonicity api [datafusion]

2024-05-22 Thread via GitHub
alamb commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2125736201 Thanks everyone for the communication. I just want everyone working on this project to have a good experience -- I am glad to hear that it was. Thanks again @tinfoil-knight and @berkay

Re: [I] Excessive memory consumption on sorting [datafusion]

2024-05-22 Thread via GitHub
alamb commented on issue #10511: URL: https://github.com/apache/datafusion/issues/10511#issuecomment-2125748590 🤔 that certainly seems like it is doing a Top(K) with 14 cores -- so I would expect this would hold at most 20 * 14 batches * 20 is the `k` * 14 is the number of file gr

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-22 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2125749341 I had to change the `describe` test because the float / double columns have nans (or maybe -inf?). This means that the `min`, which used to be a value, is now nan (which doesn't

Re: [PR] adding benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on PR #10610: URL: https://github.com/apache/datafusion/pull/10610#issuecomment-2125776581 > Thank you so much @Lordworms 🙏 > > This is perfect. I ran it locally, did some profiling, and filed #10626 to improve things. > > cc @NGA-TRAN Sure, I would a

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-22 Thread via GitHub
parthchandra commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1610670927 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1444,13 +1483,136 @@ fn parse_str_to_time_only_timestamp(value: &str) -> CometResult> { O

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-22 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2125789754 @alamb That's an excellent point on the possible confusion between the two slightly different syntaxes for the same keyword in a statement. I obviously just went with what already exi

[PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
viirya opened a new pull request, #460: URL: https://github.com/apache/datafusion-comet/pull/460 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1610733007 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -724,89 +700,80 @@ class CometSparkSessionExtensions } // We sho

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-22 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2125922609 @alamb I've updated things to use `METADTA(foo='bar')` though I'm not 100% on whether I've done it acceptably as I've basically only just allowed the use of `METADATA` in place of `OP

Re: [I] Incorrect statistics read for `i8` `i16` columns in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on issue #10585: URL: https://github.com/apache/datafusion/issues/10585#issuecomment-2125926141 I have detected the location which causes the bug, seems like in arrow-rs, when getting DataType::Int8, we always return a Int32 Array Instead https://github.com/apache/data

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-22 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2125933413 Speaking of the Arrow metadata constraints, I'm still pondering how best to try and maintain some semblance of type information here. Looking through the dependency tree it seems like

[PR] fix Incorrect statistics read for i8 i16 columns in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms opened a new pull request, #10629: URL: https://github.com/apache/datafusion/pull/10629 ## Which issue does this PR close? Closes #10585 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] Incorrect statistics read for `i8` `i16` columns in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on issue #10585: URL: https://github.com/apache/datafusion/issues/10585#issuecomment-2125939100 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Incorrect statistics read for struct array in parquet [datafusion]

2024-05-22 Thread via GitHub
Lordworms commented on issue #10609: URL: https://github.com/apache/datafusion/issues/10609#issuecomment-2125961249 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-22 Thread via GitHub
goldmedal commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2125974338 Thanks again @alamb @backkem @phillipleblanc @lewiszlw @comphead :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-22 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2125995127 Looks good. It is clearer to understand the results in this way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610830921 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{ matrix.java_v

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
jayzhan211 commented on PR #10560: URL: https://github.com/apache/datafusion/pull/10560#issuecomment-2126011843 > Thank you @jayzhan211 -- this is looking really cool. I have some feeback on the API design > > One major thing I think that might be worth considering is now to get one

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
jayzhan211 commented on PR #10560: URL: https://github.com/apache/datafusion/pull/10560#issuecomment-2126012154 > which means both adding many new functions as well as that they won't work for user defined aggregate functions I think they can build the function with macro in function-

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610831950 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{ matrix.java_v

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-22 Thread via GitHub
parthchandra commented on code in PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#discussion_r1610819301 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1452,17 +1452,55 @@ class CometExpressionSuite extends CometTestBase with Adapt

Re: [I] PlaceholderRowExec shown when select from union results. [datafusion]

2024-05-22 Thread via GitHub
xinlifoobar closed issue #10613: PlaceholderRowExec shown when select from union results. URL: https://github.com/apache/datafusion/issues/10613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610846336 ## spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Add support for Substrait List/EmptyList literals [datafusion]

2024-05-22 Thread via GitHub
jonahgao commented on code in PR #10615: URL: https://github.com/apache/datafusion/pull/10615#discussion_r1610851138 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1278,6 +1279,45 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL [datafusion]

2024-05-22 Thread via GitHub
phillipleblanc commented on PR #10625: URL: https://github.com/apache/datafusion/pull/10625#issuecomment-2126149300 > I wonder if we should update the comments in `Dialect` so that we can upstream the trait once we have some more experience with what features are needed in DataFusion 🤔

[I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-22 Thread via GitHub
liukun4515 opened a new issue, #10630: URL: https://github.com/apache/datafusion/issues/10630 ### Describe the bug In the definition of `AggregateExec`, we have the `limit` argument to optimize the agg operation. ``` pub struct AggregateExec { . /// Set i

Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-22 Thread via GitHub
liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126184432 cc @alamb I will fix this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-22 Thread via GitHub
advancedxy commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1610932255 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,38 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.nextLong()

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1610733007 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -724,89 +700,80 @@ class CometSparkSessionExtensions } // We sho

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-22 Thread via GitHub
andygrove commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1610951350 ## docs/source/user-guide/tuning.md: ## @@ -39,22 +39,26 @@ It must be set before the Spark context is created. You can enable or disable Co at runtime by se

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
viirya commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1610976334 ## spark/src/test/spark-3.4-plus/org/apache/comet/exec/CometExec3_4PlusSuite.scala: ## @@ -29,7 +29,7 @@ import org.apache.comet.CometConf /** * This test suit

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-22 Thread via GitHub
Ted-Jiang commented on PR #10607: URL: https://github.com/apache/datafusion/pull/10607#issuecomment-2126242974 > @Ted-Jiang I wonder if you might have time to review this PR at some point? Thanks ❤️ @alamb i will review this carefully later today -- This is an automated message f

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-22 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1610984960 ## datafusion/functions-aggregate/src/expr_builder.rs: ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [PR] Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL [datafusion]

2024-05-22 Thread via GitHub
backkem commented on PR #10625: URL: https://github.com/apache/datafusion/pull/10625#issuecomment-2126244362 Also, I think it's the idea to move the entire unparser to the SQLParser package eventually. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-22 Thread via GitHub
backkem commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2126248908 Do we split off a ticket reduce the nr of brackets emitted? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] PR build for Linux Java 11 with Spark 3.4 is not running [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura closed issue #389: PR build for Linux Java 11 with Spark 3.4 is not running URL: https://github.com/apache/datafusion-comet/issues/389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] PR build for Linux Java 11 with Spark 3.4 is not running [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on issue #389: URL: https://github.com/apache/datafusion-comet/issues/389#issuecomment-2126265633 Thanks. Looks like this is the history https://github.com/apache/datafusion-comet/pull/122 We can keep it as is for now. Closing -- This is an automated message

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-22 Thread via GitHub
Ted-Jiang commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1611031689 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -38,42 +38,94 @@ use crate::physical_optimizer::pruning::{PruningPredicate, Pruning

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1611033707 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{ mat

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1611036092 ## .github/workflows/pr_build.yml: ## @@ -76,6 +76,50 @@ jobs: # upload test reports only for java 17 upload-test-reports: ${{ mat

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1611036640 ## spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-22 Thread via GitHub
kazuyukitanimura commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2126285552 Thank you @viirya Please take a final look https://github.com/apache/datafusion-comet/pull/407/commits/32bc3148d542478e867372876b47b9e85eac1207 -- This is an automated m

Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-22 Thread via GitHub
liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126290921 my sql is like: ``` select LO_SUPPKEY from SSB_1G.LINEORDER GROUP BY LO_SUPPKEY limit 20 offset 10 ``` The stand-alone physical pla

Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-22 Thread via GitHub
liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126317286 But other issue i found in the `AggregateExec`, when we push the limit to the agg exec and will select the ``` if let Some(limit) = self.limit {

[I] Make TaskContext wrap SessionState [datafusion]

2024-05-22 Thread via GitHub
lewiszlw opened a new issue, #10631: URL: https://github.com/apache/datafusion/issues/10631 ### Is your feature request related to a problem or challenge? I have a user defined table function `FlattenTableFunc` whice needs reading data from other table provider. To serialize execution

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
berkaysynnada commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2126385035 Thanks a lot, @peter-toth! This looks great to me. The new `children(&self) -> Vec<&Arc`> signature also uncovers implicit clones, and many of them seem to be redundant. Our Tr

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1611152052 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +101,39 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.nextLong())

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2126435029 > So I have pushed this build removing the fuzz test now to see if the build passes. So I might need some help in trying to identify the issue with fuzz test. > Apologies for

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-23 Thread via GitHub
mhilton commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2126436509 > I'm curious as to why we wouldn't want to just make date_bin timezone aware? I'm not sure that it's obvious how a time-zone aware date_bin would behave. For example On

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#issuecomment-2126437802 @prashantksharma Do you need any assistance with this PR? Let me know if you have questions -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611205100 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: We can't create tem

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
berkaysynnada commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611219929 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: Can we skip push

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611241009 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: We need to restore

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611241009 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: We need to be able

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
berkaysynnada commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611257842 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: `with_new_childr

[PR] Update substrait requirement from 0.33.3 to 0.34.0 [datafusion]

2024-05-23 Thread via GitHub
dependabot[bot] opened a new pull request, #10632: URL: https://github.com/apache/datafusion/pull/10632 Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version. Release notes Sourced from https://github.com/substrait-io/su

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611302863 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: That's true, but pl

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611302863 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: That's true, but pl

Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-23 Thread via GitHub
liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126605415 > But other issue i found in the `AggregateExec`, when we push the limit to the agg exec and will select the > > ``` > if let Some(limit) = self.limit { >

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611302863 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: That's true, but pl

Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

2024-05-23 Thread via GitHub
alamb commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126695507 > GroupedTopKAggregateStream I believe @avantgardnerio / @thinkharderdev worked on this code, as part of https://github.com/apache/datafusion/pull/7192 -- This

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10623: URL: https://github.com/apache/datafusion/pull/10623#issuecomment-2126718060 Thank you @phillipleblanc > Is that something you generally do for all builder-type APIs, or it depends? I would say it depends. In this case I hadd a use case for `add`

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
berkaysynnada commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611413134 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: In a world where

Re: [I] Improve performance of extracting statistics from parquet files [datafusion]

2024-05-23 Thread via GitHub
alamb commented on issue #10626: URL: https://github.com/apache/datafusion/issues/10626#issuecomment-2126752102 I was thinking about this last night -- what I would suggest for this is to make functions like this for each type of https://docs.rs/parquet/latest/parquet/file/statistics

[I] Make SQL strings generated from Exprs even "prettier" [datafusion]

2024-05-23 Thread via GitHub
alamb opened a new issue, #10633: URL: https://github.com/apache/datafusion/issues/10633 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-23 Thread via GitHub
alamb commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2126771737 > Do we split off a ticket reduce the nr of brackets emitted? Excellent call -- I filed https://github.com/apache/datafusion/issues/10633 -- This is an automated message f

Re: [PR] Omit NULLS FIRST/LAST when unparsing ORDER BY clauses for MySQL [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10625: URL: https://github.com/apache/datafusion/pull/10625#issuecomment-2126775297 > Also, I think it's the idea to move the entire unparser to the SQLParser package eventually. I think this may be hard, as the Unparser relies in `Expr`s (which are a D

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611436607 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: Ah ok, I got it now

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-23 Thread via GitHub
tustvold commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2126783877 > zones repeat an hour would the two hours count into a single bin This would be the intuitive thing for it to do IMO, if people don't want this, they should cast to a ti

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2126788799 > I had to change the `describe` test because the float / double columns have nans (or maybe -inf?). This means that the `min`, which used to be a value, is now nan (which doesn't seem

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1611436607 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -258,6 +259,10 @@ impl CaseExpr { } } +lazy_static! { Review Comment: Ah ok, I got it now

Re: [PR] Minor: Add tests showing aggregate behavior for NaNs [datafusion]

2024-05-23 Thread via GitHub
alamb commented on code in PR #10634: URL: https://github.com/apache/datafusion/pull/10634#discussion_r1611461664 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -4374,6 +4374,42 @@ GROUP BY dummy text1, text1, text1 +# Tests for aggregating with NaN values +

<    1   2   3   4   5   6   7   8   9   10   >