Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612318862 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | Data

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612318862 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | Data

Re: [PR] Add `ParquetExec::builder` API [datafusion]

2024-05-23 Thread via GitHub
alamb closed pull request #10636: Add `ParquetExec::builder` API URL: https://github.com/apache/datafusion/pull/10636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] test: parametrize test_array_functions [datafusion-python]

2024-05-23 Thread via GitHub
andygrove merged PR #678: URL: https://github.com/apache/datafusion-python/pull/678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Use int64 for TPC-H keys and set input schema to not nullable [datafusion-python]

2024-05-23 Thread via GitHub
andygrove merged PR #714: URL: https://github.com/apache/datafusion-python/pull/714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Fix Already Borrowed Panic When SessionContext Used in Multiple Threads [datafusion-python]

2024-05-23 Thread via GitHub
andygrove closed pull request #367: Fix Already Borrowed Panic When SessionContext Used in Multiple Threads URL: https://github.com/apache/datafusion-python/pull/367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Fix Already Borrowed Panic When SessionContext Used in Multiple Threads [datafusion-python]

2024-05-23 Thread via GitHub
andygrove commented on PR #367: URL: https://github.com/apache/datafusion-python/pull/367#issuecomment-2128107098 I'll go ahead and close this since it has been open for a year and is in draft. @kylebrooks-8451 feel free to re-open if you resume work on this -- This is an automate

[PR] Simplify `ParquetExec::new()` [datafusion]

2024-05-23 Thread via GitHub
alamb opened a new pull request, #10643: URL: https://github.com/apache/datafusion/pull/10643 ## Which issue does this PR close? Part of #10546 ## Rationale for this change While working on https://github.com/apache/datafusion/pull/10549 it was cumbersome to create a

Re: [PR] Simplify `ParquetExec::new()` [datafusion]

2024-05-23 Thread via GitHub
alamb closed pull request #10643: Simplify `ParquetExec::new()` URL: https://github.com/apache/datafusion/pull/10643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Simplify `ParquetExec::new()` [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10643: URL: https://github.com/apache/datafusion/pull/10643#issuecomment-2128118789 The more I think about this the more I like https://github.com/apache/datafusion/pull/10636 and deprecate the ParquetExec::new function... -- This is an automated message from the A

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove merged PR #451: URL: https://github.com/apache/datafusion-comet/pull/451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-23 Thread via GitHub
appletreeisyellow commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2128168367 To make `date_bin` timezone aware, there are some edge cases we need to consider when design it: 1. **Daylight Saving Time (DST) Transitions:** - Spring F

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-23 Thread via GitHub
appletreeisyellow commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2128169898 > * Fall Back: When the clocks move backward, there is an "extra" hour. For example, in US central time zone, when DST ends at 2:00 AM, the clocks are set back to 1:00

Re: [I] Create a DataFusion blog [datafusion]

2024-05-23 Thread via GitHub
andygrove closed issue #10535: Create a DataFusion blog URL: https://github.com/apache/datafusion/issues/10535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] Create a DataFusion blog [datafusion]

2024-05-23 Thread via GitHub
andygrove commented on issue #10535: URL: https://github.com/apache/datafusion/issues/10535#issuecomment-2128187298 This task is complete: https://datafusion.apache.org/blog/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] Add tests for casting between timestamp types [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new issue, #467: URL: https://github.com/apache/datafusion-comet/issues/467 ### What is the problem the feature request solves? We currently delegate to DataFusion when casting between timestamps (as discovered in https://github.com/apache/datafusion-comet/pull/461)

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612478631 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,91 @@ impl Cast { self.eval_mode, from_type,

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#issuecomment-2128254128 This is ready for review now @viirya @parthchandra @kazuyukitanimura @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612492325 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -503,41 +503,37 @@ impl Cast { fn cast_array(&self, array: ArrayRef) -> DataFusionResult {

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1612495192 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

[PR] Move Median to `functions-aggregate` [datafusion]

2024-05-23 Thread via GitHub
jayzhan211 opened a new pull request, #10644: URL: https://github.com/apache/datafusion/pull/10644 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] Avoid clone for LogicalPlan during optimizer passes [datafusion]

2024-05-23 Thread via GitHub
github-actions[bot] commented on PR #9768: URL: https://github.com/apache/datafusion/pull/9768#issuecomment-2128338041 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Replace logical plan from Arc to Box [datafusion]

2024-05-23 Thread via GitHub
github-actions[bot] commented on PR #9763: URL: https://github.com/apache/datafusion/pull/9763#issuecomment-2128338094 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-23 Thread via GitHub
advancedxy commented on PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#issuecomment-2128357191 Thanks everyone for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-23 Thread via GitHub
vaibhawvipul commented on issue #465: URL: https://github.com/apache/datafusion-comet/issues/465#issuecomment-2128357365 I am working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612605807 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | DataTyp

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-23 Thread via GitHub
advancedxy commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1612607959 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -38,42 +38,100 @@ use crate::physical_optimizer::pruning::{PruningPredicate, Pruni

[PR] Fix incorrect statistics read for binary columns in parquet [datafusion]

2024-05-23 Thread via GitHub
xinlifoobar opened a new pull request, #10645: URL: https://github.com/apache/datafusion/pull/10645 ## Which issue does this PR close? Closes #10605 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Replace logical plan from Arc to Box [datafusion]

2024-05-23 Thread via GitHub
jayzhan211 closed pull request #9763: Replace logical plan from Arc to Box URL: https://github.com/apache/datafusion/pull/9763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Minor: add runtime asserts to `RowGroup` [datafusion]

2024-05-23 Thread via GitHub
viirya merged PR #10641: URL: https://github.com/apache/datafusion/pull/10641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-23 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1612654116 ## runners/datafusion-comet/tpcbench.py: ## @@ -0,0 +1,108 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agree

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-23 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1612654513 ## runners/datafusion-comet/tpcbench.py: ## @@ -0,0 +1,108 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agree

Re: [I] feat: Support ANSI mode for round [datafusion-comet]

2024-05-23 Thread via GitHub
vidyasankarv commented on issue #466: URL: https://github.com/apache/datafusion-comet/issues/466#issuecomment-2128653555 I will work on this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] June 2024 ASF Board Report [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #10155: URL: https://github.com/apache/datafusion/issues/10155#issuecomment-2128812716 Draft report: https://docs.google.com/document/d/1h4yjvomQO0XdzxKuE4aBSWGNliFFmn8GADd8DlPuXBw/edit -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Minor: add runtime asserts to `RowGroup` [datafusion]

2024-05-24 Thread via GitHub
alamb commented on PR #10641: URL: https://github.com/apache/datafusion/pull/10641#issuecomment-2128826061 Thanks @viirya and @advancedxy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-24 Thread via GitHub
alamb commented on code in PR #10638: URL: https://github.com/apache/datafusion/pull/10638#discussion_r1613016952 ## datafusion-cli/Dockerfile: ## @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. -FROM rust:1.73-bullseye as bui

Re: [PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-24 Thread via GitHub
alamb merged PR #10638: URL: https://github.com/apache/datafusion/pull/10638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Docker CLI build fails in WSL2 - "Ubuntu 22.04.4 LTS" [datafusion]

2024-05-24 Thread via GitHub
alamb closed issue #10472: Docker CLI build fails in WSL2 - "Ubuntu 22.04.4 LTS" URL: https://github.com/apache/datafusion/issues/10472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-24 Thread via GitHub
waynexia opened a new pull request, #10646: URL: https://github.com/apache/datafusion/pull/10646 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? Support convert to/from three

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao commented on code in PR #10640: URL: https://github.com/apache/datafusion/pull/10640#discussion_r1613068493 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1686,7 +1691,7 @@ fn to_substrait_bounds(window_frame: &WindowFrame) -> Result<(Bound, Bound)> {

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao commented on code in PR #10640: URL: https://github.com/apache/datafusion/pull/10640#discussion_r1613097509 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1175,6 +1180,47 @@ pub(crate) fn from_substrait_type(dt: &substrait::proto::Type) -> Result Result {

[PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-24 Thread via GitHub
alamb opened a new pull request, #10647: URL: https://github.com/apache/datafusion/pull/10647 ## Which issue does this PR close? Part of #10549 ## Rationale for this change While trying to make an example that uses ParquetExec, I found it's documentation could be improved

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-24 Thread via GitHub
metegenez commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2129065420 There is a robust method to define column-specific options in Datafusion table options. I believe there should be a single way to do this in Datafusion, but BigQuery is widely used

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-24 Thread via GitHub
alamb commented on PR #10647: URL: https://github.com/apache/datafusion/pull/10647#issuecomment-2129087302 @thinkharderdev , @tustvold, @Ted-Jiang and @crepererum: if you have time, could you double check that this correctly describes `ParquetExec` to your understanding? -- This is an

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-24 Thread via GitHub
ozankabak commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2129132050 @davisp what I meant was looking at how other dialects (and systems using those dialects, such as BigQuery) handle column-specific metadata and analyze pros and cons of various app

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129137066 Moving out of slack into Github so it might be more easily found If your usecase is to to get the list of filters and tables that appear in a query, one way to do this is:

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
Blizzara commented on code in PR #10640: URL: https://github.com/apache/datafusion/pull/10640#discussion_r1613246909 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1686,7 +1691,7 @@ fn to_substrait_bounds(window_frame: &WindowFrame) -> Result<(Bound, Bound)> {

Re: [I] Make TaskContext wrap SessionState [datafusion]

2024-05-24 Thread via GitHub
crepererum commented on issue #10631: URL: https://github.com/apache/datafusion/issues/10631#issuecomment-2129296188 I would suggest a rather larger refactoring? We have: - `SessionState` - `SessionConfig` - `SessionContext` - `TaskContext` - `RuntimeConfig` - `RuntimeEn

[PR] Convert first, last aggregate function to UDAF [datafusion]

2024-05-24 Thread via GitHub
mustafasrepo opened a new pull request, #10648: URL: https://github.com/apache/datafusion/pull/10648 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-24 Thread via GitHub
crepererum commented on code in PR #10647: URL: https://github.com/apache/datafusion/pull/10647#discussion_r1613322378 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -75,7 +75,79 @@ pub use metrics::ParquetFileMetrics; pub use schema_adapter::{SchemaAdapt

[PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
mustafasrepo opened a new pull request, #10649: URL: https://github.com/apache/datafusion/pull/10649 ## Which issue does this PR close? Closes #10477. ## Rationale for this change See [issue body](https://github.com/apache/datafusion/issues/10477#issue-22

[PR] Minor: Csv Options Clean-up [datafusion]

2024-05-24 Thread via GitHub
berkaysynnada opened a new pull request, #10650: URL: https://github.com/apache/datafusion/pull/10650 ## Which issue does this PR close? Closes #. ## Rationale for this change When CSV header option is not specified from the options clause, it is set from

Re: [PR] Convert first, last aggregate function to UDAF [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10648: URL: https://github.com/apache/datafusion/pull/10648#discussion_r1613370248 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -161,6 +144,23 @@ impl AggregateUDFImpl for FirstValue { fn aliases(&self) -> &[String] {

Re: [I] Implement Spark-compatible CAST from String to Timestamp [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on issue #328: URL: https://github.com/apache/datafusion-comet/issues/328#issuecomment-2129430050 There is a follow on issue to complete this work: https://github.com/apache/datafusion-comet/issues/376 -- This is an automated message from the Apache Git Service. To re

Re: [I] Versions >32.0.0 on PyPI have broken substrait support [datafusion-python]

2024-05-24 Thread via GitHub
mbwhite commented on issue #646: URL: https://github.com/apache/datafusion-python/issues/646#issuecomment-2129449063 FYI _ tried the v38.0.0. from pypi-test and problem remains. Rebuilding the code locally and using the wheel created then works fine `maturin build --features substrait`

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-24 Thread via GitHub
matthewmturner commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2129456972 @alamb i have a working example now. i have idea to update it to show more of the streaming nature (i.e. write to the fifo and get batches multiple times) but wont have time f

[PR] Convert Sum to UDAF [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 opened a new pull request, #10651: URL: https://github.com/apache/datafusion/pull/10651 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao merged PR #10640: URL: https://github.com/apache/datafusion/pull/10640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao commented on PR #10640: URL: https://github.com/apache/datafusion/pull/10640#issuecomment-2129492219 I plan to merge this PR now as it might conflict with #10646. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Minor: Add tests showing aggregate behavior for NaNs [datafusion]

2024-05-24 Thread via GitHub
westonpace commented on code in PR #10634: URL: https://github.com/apache/datafusion/pull/10634#discussion_r1613461895 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -4374,6 +4374,42 @@ GROUP BY dummy text1, text1, text1 +# Tests for aggregating with NaN val

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-24 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2129505697 > So that suggests to me it treats NaN as the largest floating point value If this is the case then there is divergence between postgres and arrow-rs. Which takes priority?

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-24 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2129511762 > So that suggests to me it treats NaN as the largest floating point value This is confirmed in the latest version of the [postgres docs](https://www.postgresql.org/docs/cur

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-24 Thread via GitHub
faucct commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2129536338 Though the paper that you have mentioned admits that JIT-compilation is beneficial for OLTP workloads: > Besides OLAP performance, other factors also play an important role.

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-24 Thread via GitHub
Omega359 commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2129537623 This shouldn't have passed checks. ``` + cargo fmt --all -- --check `cargo metadata` exited with an error: error: failed to load manifest for workspace member `/opt/dev/

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1613540288 ## datafusion/functions-aggregate/src/median.rs: ## @@ -15,71 +15,105 @@ // specific language governing permissions and limitations // under the License. -/

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1613550658 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -39,6 +39,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-schema = { workspa

[I] bug: CAST timestamp to string ignores timezone prior to Spark 3.4 [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove opened a new issue, #468: URL: https://github.com/apache/datafusion-comet/issues/468 ### Describe the bug In `CometExpressionSuite` we have two tests that are ignored for Spark 3.2 and 3.3. ```scala test("cast timestamp and timestamp_ntz to string") { // T

Re: [I] Bad CPU type in executable protoc-jar [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove closed issue #227: Bad CPU type in executable protoc-jar URL: https://github.com/apache/datafusion-comet/issues/227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Bad CPU type in executable protoc-jar [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on issue #227: URL: https://github.com/apache/datafusion-comet/issues/227#issuecomment-2129653865 This is no longer an issue for me and we have not had other reports of this happening, so will close this -- This is an automated message from the Apache Git Service. To

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
cisaacson commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129731243 This looks very good, pretty much what I implemented. The only question I have remaining is: If `TableScan` has `filters` why would that not catch all filters? What does

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129736024 > If TableScan has filters why would that not catch all filters? Depending on the value of https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.TableProviderFilte

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-24 Thread via GitHub
Blizzara commented on PR #10531: URL: https://github.com/apache/datafusion/pull/10531#issuecomment-2129741689 @jonahgao @alamb this last one is ready now too :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[I] Support convert LogicalPlan JOIN with `Using` constraint to SQL String [datafusion]

2024-05-24 Thread via GitHub
goldmedal opened a new issue, #10652: URL: https://github.com/apache/datafusion/issues/10652 ### Is your feature request related to a problem or challenge? We only support to convert JOIN with `ON` constraint to SQL String now. The SQL as below can't be converted now. ``` SELECT

Re: [I] Support convert LogicalPlan JOIN with `Using` constraint to SQL String [datafusion]

2024-05-24 Thread via GitHub
goldmedal commented on issue #10652: URL: https://github.com/apache/datafusion/issues/10652#issuecomment-2129764802 By the way, I saw there're other unimplemented plans in `plan.rs`: - Distinct - Union - Window - Extension (I guess we need to provide some method for `UserDefinedL

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
cisaacson commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129782772 Got it, that makes total sense. I will only care about the accepted pushdown filters, so the `TableScan` will work for what we need. -- This is an automated message from th

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove merged PR #433: URL: https://github.com/apache/datafusion-comet/pull/433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] bug: hash expression is not consistent with Spark [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove closed issue #427: bug: hash expression is not consistent with Spark URL: https://github.com/apache/datafusion-comet/issues/427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613680726 ## datafusion/proto-common/Cargo.toml: ## @@ -0,0 +1,54 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613684688 ## datafusion/proto-common/gen/src/main.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613687288 ## datafusion/proto-common/src/lib.rs: ## @@ -0,0 +1,62 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613685757 ## datafusion/proto-common/src/common.rs: ## @@ -0,0 +1,22 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agree

Re: [PR] Add tests for reading numeric limits in parquet statistics [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10642: URL: https://github.com/apache/datafusion/pull/10642#discussion_r1613714632 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -212,13 +218,13 @@ impl Test { let expected_null_counts = Arc::new(expected_null_counts) as A

Re: [PR] Add tests for reading numeric limits in parquet statistics [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10642: URL: https://github.com/apache/datafusion/pull/10642#discussion_r1613715175 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -212,13 +218,13 @@ impl Test { let expected_null_counts = Arc::new(expected_null_counts) as A

[PR] feat: Add support for RLike [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove opened a new pull request, #469: URL: https://github.com/apache/datafusion-comet/pull/469 ## Which issue does this PR close? N/A ## Rationale for this change Regular expression support is usually important in ETL jobs, so we should start adding

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-24 Thread via GitHub
faucct commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2129907158 I think that compiling SQL-expressions to UDFs by hand would kinda kill the whole point of the framework, but it seems like most of the framework would be irrelevant for the in-mem

[PR] Support consuming Substrait with compound signature function names [datafusion]

2024-05-24 Thread via GitHub
Blizzara opened a new pull request, #10653: URL: https://github.com/apache/datafusion/pull/10653 Substrait 0.32.0+ requires functions to be specified using compound names, which include the function name as well as the arguments it takes. We don't necessarily need that information while con

Re: [PR] Support consuming Substrait with compound signature function names [datafusion]

2024-05-24 Thread via GitHub
Blizzara commented on PR #10653: URL: https://github.com/apache/datafusion/pull/10653#issuecomment-2129941680 @alamb curious what you think of fixing just the consumer side first, without touching the producer - if that'd be okay, then I can add some unit tests to this PR? -- This is an

Re: [PR] feat: Add support for RLike [datafusion-comet]

2024-05-24 Thread via GitHub
viirya commented on code in PR #469: URL: https://github.com/apache/datafusion-comet/pull/469#discussion_r1613739740 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1094,24 +1094,46 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wit

Re: [PR] feat: Add support for RLike [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on code in PR #469: URL: https://github.com/apache/datafusion-comet/pull/469#discussion_r1613743037 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1094,24 +1094,46 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] Minor: Csv Options Clean-up [datafusion]

2024-05-24 Thread via GitHub
metegenez commented on code in PR #10650: URL: https://github.com/apache/datafusion/pull/10650#discussion_r1613754503 ## datafusion/core/src/datasource/file_format/csv.rs: ## @@ -301,13 +296,7 @@ impl CsvFormat { while let Some(chunk) = stream.next().await.transpose()

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
tshauck commented on PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#issuecomment-2130054380 I think I made the updates requested in the latest round. I left the dictionary handling the same, but I can look into flattening the dictionary specific to hex if you guys think i

Re: [I] bug: ABS should only overflow in ANSI mode [datafusion-comet]

2024-05-24 Thread via GitHub
planga82 commented on issue #464: URL: https://github.com/apache/datafusion-comet/issues/464#issuecomment-2130081236 I want to try this!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Parquet Predicate Pushdown Does Not Handle Type Coercion [datafusion]

2024-05-24 Thread via GitHub
jeffreyssmith2nd commented on issue #7925: URL: https://github.com/apache/datafusion/issues/7925#issuecomment-2130106971 The case we're running into in InfluxDB when enabling timezones is slightly different. It is a parquet file with Timestamp without a timezone and then querying with eithe

[I] Error on `NULL["field_name"]`: The expression to get an indexed field is only valid for `List`, `Struct`, or `Map` types, got Null [datafusion]

2024-05-24 Thread via GitHub
alamb opened a new issue, #10654: URL: https://github.com/apache/datafusion/issues/10654 ### Describe the bug Expr::field is broken for ScalarValue::Null After https://github.com/apache/datafusion/pull/10375 merged `Expr::field` is broken when we try and do it on `ScalarValue::

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-24 Thread via GitHub
kazuyukitanimura commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1613863992 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode, fro

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1613906351 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with Ad

[I] Selecting struct field within field produces unexpected results [datafusion-python]

2024-05-24 Thread via GitHub
timsaucer opened a new issue, #715: URL: https://github.com/apache/datafusion-python/issues/715 **Describe the bug** When you have a column that is a struct of struct and you attempt to index into the lowest level, if there is a null at the first level of the struct you get an unexpected

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2130243583 FIW there is a lot more to SQL evaluation than just the expression evaluation, so that might be a reason to use DataFusion even if you had to implement your own expressions 🤔 --

[I] Wrong error thrown when unnesting a list of struct [datafusion]

2024-05-24 Thread via GitHub
duongcongtoai opened a new issue, #10656: URL: https://github.com/apache/datafusion/issues/10656 ### Describe the bug Given this slt ``` statement ok CREATE TABLE temp AS VALUES ([struct(1,2)]) ; query ? select unnest(column1) as struct_elem from temp;

Re: [I] Wrong error thrown when unnesting a list of struct [datafusion]

2024-05-24 Thread via GitHub
duongcongtoai commented on issue #10656: URL: https://github.com/apache/datafusion/issues/10656#issuecomment-2130247278 I'm open a PR to fix soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

<    3   4   5   6   7   8   9   10   11   12   >