Re: [PR] Convert variance sample to udaf [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 commented on PR #10713: URL: https://github.com/apache/datafusion/pull/10713#issuecomment-2146791760 @yyin-dev There are some error left to fix You can try `./dev/rust_lint.sh`, `cargo test --test sqllogictests` and `cargo test --lib --tests --bins --features avro,json` to

Re: [PR] Handle empty rows for `array_sort` [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 commented on PR #10786: URL: https://github.com/apache/datafusion/pull/10786#issuecomment-2146999858 Thanks @Dandandan ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Handle empty rows for `array_sort` [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 merged PR #10786: URL: https://github.com/apache/datafusion/pull/10786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] `array_sort` fails if input batch is empty [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 closed issue #10772: `array_sort` fails if input batch is empty URL: https://github.com/apache/datafusion/issues/10772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Error `NamedStructField should be rewritten in OperatorToFunction with subquery` if query is wrapped in view [datafusion]

2024-06-04 Thread via GitHub
jonahgao commented on issue #10764: URL: https://github.com/apache/datafusion/issues/10764#issuecomment-2147028848 It appears to be successful on the current [main](https://github.com/apache/datafusion/commit/6a0a2dce90fde73d36a5451a59381cd357fae648) branch. ```sh DataFusion CLI v

Re: [PR] Fix extract parquet statistics from LargeBinary columns [datafusion]

2024-06-04 Thread via GitHub
alamb merged PR #10775: URL: https://github.com/apache/datafusion/pull/10775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Extract parquet statistics from `LargeBinary` columns [datafusion]

2024-06-04 Thread via GitHub
alamb closed issue #10753: Extract parquet statistics from `LargeBinary` columns URL: https://github.com/apache/datafusion/issues/10753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Fix extract parquet statistics from Decimal256 columns [datafusion]

2024-06-04 Thread via GitHub
xinlifoobar closed pull request #10777: Fix extract parquet statistics from Decimal256 columns URL: https://github.com/apache/datafusion/pull/10777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Extract parquet statistics from Time32 and Time64 columns [datafusion]

2024-06-04 Thread via GitHub
alamb merged PR #10771: URL: https://github.com/apache/datafusion/pull/10771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Extract parquet statistics from `Time32` and `Time64` columns [datafusion]

2024-06-04 Thread via GitHub
alamb closed issue #10751: Extract parquet statistics from `Time32` and `Time64` columns URL: https://github.com/apache/datafusion/issues/10751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Extract parquet statistics from Time32 and Time64 columns [datafusion]

2024-06-04 Thread via GitHub
alamb commented on PR #10771: URL: https://github.com/apache/datafusion/pull/10771#issuecomment-2147201910 Thanks again @comphead and @Lordworms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore: fix `last_value` coercion [datafusion]

2024-06-04 Thread via GitHub
alamb commented on PR #10783: URL: https://github.com/apache/datafusion/pull/10783#issuecomment-2147202669 Thanks for the review @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Regression in `first_value` and `last_value` coercion [datafusion]

2024-06-04 Thread via GitHub
alamb closed issue #10781: Regression in `first_value` and `last_value` coercion URL: https://github.com/apache/datafusion/issues/10781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] chore: fix `last_value` coercion [datafusion]

2024-06-04 Thread via GitHub
alamb merged PR #10783: URL: https://github.com/apache/datafusion/pull/10783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix extract parquet statistics from Decimal256 columns [datafusion]

2024-06-04 Thread via GitHub
alamb commented on PR #10777: URL: https://github.com/apache/datafusion/pull/10777#issuecomment-2147206147 > > I am slowly merging these PRs after resolving conflicts. I hope to be done by tomorrow > > Hey @alamb, because we are not in the same time zone, I couldn't resolve such issu

Re: [I] Merge `ScalarUDFImpl`'s `invoke_no_args` and `invoke` into one method [datafusion]

2024-06-04 Thread via GitHub
alamb commented on issue #10773: URL: https://github.com/apache/datafusion/issues/10773#issuecomment-2147221161 Thank you @lewiszlw I agree that your design with an additional `InvokeInfo` is more elegant / easier to use and if we were starting today without any existing code / user

Re: [PR] chore: Switch to stable Rust [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on PR #505: URL: https://github.com/apache/datafusion-comet/pull/505#issuecomment-2147291617 > LGTM. Thanks for taking this over. I was about to ask chao if he is still working on the #373. > > > > It looks like the CI is still failed, you may take a look a

Re: [I] Merge `ScalarUDFImpl`'s `invoke_no_args` and `invoke` into one method [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 commented on issue #10773: URL: https://github.com/apache/datafusion/issues/10773#issuecomment-2147301957 > Thank you @lewiszlw > > I agree that your design with an additional `InvokeInfo` is more elegant / easier to use and if we were starting today without any existing co

Re: [PR] Document Committer and PMC process [datafusion]

2024-06-04 Thread via GitHub
Weijun-H commented on code in PR #10778: URL: https://github.com/apache/datafusion/pull/10778#discussion_r1625867310 ## docs/source/contributor-guide/inviting.md: ## @@ -0,0 +1,427 @@ + + +# Inviting New Committers and PMC Members + +This is a cookbook of the recommended DataFus

Re: [PR] Convert variance sample to udaf [datafusion]

2024-06-04 Thread via GitHub
yyin-dev commented on PR #10713: URL: https://github.com/apache/datafusion/pull/10713#issuecomment-2147379631 > @yyin-dev There are some error left to fix > > You can try `./dev/rust_lint.sh`, `cargo test --test sqllogictests` and `cargo test --lib --tests --bins --features avro,json`

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#issuecomment-2147381977 @viirya @comphead @parthchandra @kazuyukitanimura @huaxingao @advancedxy Could I get more feedback on this PR so that we can prepare for the upcoming 0.1.0 source release, which

[PR] docs: changes in documentation [datafusion-comet]

2024-06-04 Thread via GitHub
SemyonSinchenko opened a new pull request, #512: URL: https://github.com/apache/datafusion-comet/pull/512 ## Which issue does this PR close? Closes #503 Closes #191 ## Rationale for this change 1. Provide a way to build Comet from the source on an isolated environments w

[PR] bug: adding chr null test [datafusion-comet]

2024-06-04 Thread via GitHub
vaibhawvipul opened a new pull request, #513: URL: https://github.com/apache/datafusion-comet/pull/513 ## Which issue does this PR close? Closes #479 . ## Rationale for this change ## What changes are included in this PR? ## How are these ch

Re: [I] bug: null character not permitted in chr function [datafusion-comet]

2024-06-04 Thread via GitHub
vaibhawvipul commented on issue #479: URL: https://github.com/apache/datafusion-comet/issues/479#issuecomment-2147430371 @andygrove I am not able to reproduce it, can you please help me out? I have raised a draft PR where I have added a test cases which uses null charecters in chr function

[PR] minor: Refactor some unparser methods to improve readability [datafusion]

2024-06-04 Thread via GitHub
devinjdangelo opened a new pull request, #10788: URL: https://github.com/apache/datafusion/pull/10788 ## Which issue does this PR close? none ## Rationale for this change I noticed while working on #10767 some unparser methods have grown unwieldy and difficult to underst

Re: [PR] Fix extract parquet statistics from Decimal256 columns [datafusion]

2024-06-04 Thread via GitHub
alamb merged PR #10777: URL: https://github.com/apache/datafusion/pull/10777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Extract parquet statistics from `Decimal256` columns [datafusion]

2024-06-04 Thread via GitHub
alamb closed issue #10755: Extract parquet statistics from `Decimal256` columns URL: https://github.com/apache/datafusion/issues/10755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Fix extract parquet statistics from Decimal256 columns [datafusion]

2024-06-04 Thread via GitHub
alamb commented on PR #10777: URL: https://github.com/apache/datafusion/pull/10777#issuecomment-2147514819 Thanks @Weijun-H and @xinlifoobar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-06-04 Thread via GitHub
planga82 commented on PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#issuecomment-2147513072 Hi @vaibhawvipul , I don't know why, but it seems that a test included in this PR is failing in the main branch in spak 4.0 tests. Any thoughts on this? Thanks in advance!

Re: [PR] Speed up arrow_statistics test [datafusion]

2024-06-04 Thread via GitHub
alamb commented on PR #10735: URL: https://github.com/apache/datafusion/pull/10735#issuecomment-2147535236 THe other statistics PRs have been merged and I resolved conflicts in this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] chore: Switch to stable Rust [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove merged PR #505: URL: https://github.com/apache/datafusion-comet/pull/505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Switch to stable Rust [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove closed issue #142: Switch to stable Rust URL: https://github.com/apache/datafusion-comet/issues/142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
advancedxy commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1625984926 ## core/Cargo.toml: ## @@ -16,17 +16,21 @@ # under the License. [package] -name = "comet" +name = "datafusion-comet" version = "0.1.0" +homepage = "https

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-06-04 Thread via GitHub
vaibhawvipul commented on PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#issuecomment-2147590729 > Hi @vaibhawvipul , > > I don't know why, but it seems that a test included in this PR is failing in the main branch in spak 4.0 tests. Any thoughts on this? Thanks in

Re: [I] Plan Comet 0.1.0 Release [datafusion-comet]

2024-06-04 Thread via GitHub
advancedxy commented on issue #369: URL: https://github.com/apache/datafusion-comet/issues/369#issuecomment-2147596405 > I propose that we create the 0.1.0 source release as soon as we have upated the project to use the upcoming DataFusion 39.0.0 release which should be avalable around Jun

[PR] feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove opened a new pull request, #514: URL: https://github.com/apache/datafusion-comet/pull/514 ## Which issue does this PR close? N/A ## Rationale for this change Ongoing improvements to fuzz testing. ## What changes are included in this PR?

Re: [I] bug: null character not permitted in chr function [datafusion-comet]

2024-06-04 Thread via GitHub
advancedxy commented on issue #479: URL: https://github.com/apache/datafusion-comet/issues/479#issuecomment-2147607135 I think the invalid input is 0, so datafusion refuses ``` > select chr(0); Execution error: null character not permitted. ``` To be compatible with that,

Re: [PR] docs: changes in documentation [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on PR #512: URL: https://github.com/apache/datafusion-comet/pull/512#issuecomment-2147609628 The CI failures with Spark 4 can be fixed by merging latest from main branch -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on code in PR #514: URL: https://github.com/apache/datafusion-comet/pull/514#discussion_r1626071766 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2169,10 +2169,10 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] Speed up arrow_statistics test [datafusion]

2024-06-04 Thread via GitHub
comphead merged PR #10735: URL: https://github.com/apache/datafusion/pull/10735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] bug: null character not permitted in chr function [datafusion-comet]

2024-06-04 Thread via GitHub
vaibhawvipul commented on issue #479: URL: https://github.com/apache/datafusion-comet/issues/479#issuecomment-2147623128 > I think the invalid input is 0, so datafusion refuses > > ``` > > select chr(0); > Execution error: null character not permitted. > ``` > > To be

[I] Add documentation on building with Spark 4 [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove opened a new issue, #515: URL: https://github.com/apache/datafusion-comet/issues/515 ### What is the problem the feature request solves? I can run individual test suites against Spark 3.4 with the following command: ```shell ./mvnw test -DwildcardSuites="org.apache

Re: [I] Add documentation on building with Spark 4 [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on issue #515: URL: https://github.com/apache/datafusion-comet/issues/515#issuecomment-2147645601 Hi @kazuyukitanimura. Would you mind handling this as part of the Spark 4 work you are working on? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-06-04 Thread via GitHub
jalkjaer commented on code in PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#discussion_r1626111870 ## build_for_centos7.sh: ## @@ -0,0 +1,5 @@ +docker build -t comet_build_env_centos7:1.0 -f core/comet_build_env_centos7.dockerfile Review Comment: ```

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-06-04 Thread via GitHub
jalkjaer commented on code in PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#discussion_r1626112822 ## core/Cross.toml: ## @@ -0,0 +1,2 @@ +[target.x86_64-unknown-linux-gnu] Review Comment: ```suggestion # Licensed to the Apache Software Foundation (ASF

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-06-04 Thread via GitHub
jalkjaer commented on code in PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#discussion_r1626113598 ## core/comet_build_env_centos7.dockerfile: ## @@ -0,0 +1,36 @@ +FROM centos:7 Review Comment: ```suggestion # Licensed to the Apache Software Foundation

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-06-04 Thread via GitHub
jalkjaer commented on code in PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#discussion_r1626113598 ## core/comet_build_env_centos7.dockerfile: ## @@ -0,0 +1,36 @@ +FROM centos:7 Review Comment: ```suggestion # Licensed to the Apache Software Foundation

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-06-04 Thread via GitHub
jalkjaer commented on code in PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#discussion_r1626112822 ## core/Cross.toml: ## @@ -0,0 +1,2 @@ +[target.x86_64-unknown-linux-gnu] Review Comment: ```suggestion # Licensed to the Apache Software Foundation (ASF

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-06-04 Thread via GitHub
jalkjaer commented on code in PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#discussion_r1626111870 ## build_for_centos7.sh: ## @@ -0,0 +1,5 @@ +docker build -t comet_build_env_centos7:1.0 -f core/comet_build_env_centos7.dockerfile Review Comment: ```sugg

[I] `Int64` as default type for `make_array` function empty or null case [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 opened a new issue, #10789: URL: https://github.com/apache/datafusion/issues/10789 ### Is your feature request related to a problem or challenge? I would like to set the default type for `make_array` from `null` to `i64`, so other array function can have `List(I64)` by defa

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626125134 ## dev/release/verify-release-candidate.sh: ## @@ -0,0 +1,153 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contr

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626130754 ## core/Cargo.toml: ## @@ -16,17 +16,21 @@ # under the License. [package] -name = "comet" +name = "datafusion-comet" version = "0.1.0" +homepage = "https:

Re: [I] Plan Comet 0.1.0 Release [datafusion-comet]

2024-06-04 Thread via GitHub
viirya commented on issue #369: URL: https://github.com/apache/datafusion-comet/issues/369#issuecomment-2147710351 We plan to do binary release, although it might not be able to catch up the 0.1.0 source release. Publishing to Maven repo needs more works to do. Comet involves native code,

[PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 opened a new pull request, #10790: URL: https://github.com/apache/datafusion/pull/10790 ## Which issue does this PR close? Closes #10789 . ## Rationale for this change ## What changes are included in this PR? Slt: 1. Convert Li

Re: [PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 commented on code in PR #10790: URL: https://github.com/apache/datafusion/pull/10790#discussion_r1626157639 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -346,8 +346,8 @@ AS VALUES (arrow_cast(make_array([[1,2]], [[3, 4]]), 'FixedSizeList(2, List(List(Int

[I] `array_slice` panicked when called with empty args [datafusion]

2024-06-04 Thread via GitHub
jonahgao opened a new issue, #10791: URL: https://github.com/apache/datafusion/issues/10791 ### Describe the bug thread 'main' panicked at datafusion/functions-array/src/extract.rs:255:39: index out of bounds: the len is 0 but the index is 0 ### To Reproduce Run the qu

Re: [PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 commented on code in PR #10790: URL: https://github.com/apache/datafusion/pull/10790#discussion_r1626159757 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2666,19 +2679,19 @@ select array_concat(make_array(), make_array(2, 3)); query ? select array_concat(m

Re: [PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-04 Thread via GitHub
jayzhan211 commented on code in PR #10790: URL: https://github.com/apache/datafusion/pull/10790#discussion_r1626166147 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2030,6 +2030,13 @@ NULL [, 51, 52, 54, 55, 56, 57, 58, 59, 60] [61, 62, 63, 64, 65, 66, 67, 68, 69, 70

Re: [PR] docs: changes in documentation [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on PR #512: URL: https://github.com/apache/datafusion-comet/pull/512#issuecomment-2147764587 I'm testing out the Makefile change as part of another PR where I ran into the same issue, but am getting ``` make: Nothing to be done for `release-nogit'. ```

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
comphead commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626197896 ## dev/release/check-rat-report.py: ## @@ -0,0 +1,59 @@ +#!/usr/bin/python +## +# Li

Re: [I] Error `NamedStructField should be rewritten in OperatorToFunction with subquery` if query is wrapped in view [datafusion]

2024-06-04 Thread via GitHub
sergiimk commented on issue #10764: URL: https://github.com/apache/datafusion/issues/10764#issuecomment-2147809643 Also hit a similar error in a different scenario and can confirm that `master` version fixed it for me. -- This is an automated message from the Apache Git Service. To respon

Re: [I] Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally) [datafusion]

2024-06-04 Thread via GitHub
alamb commented on issue #4850: URL: https://github.com/apache/datafusion/issues/4850#issuecomment-2147822689 @edmondop I think this would be a great to add as an example / thing to implement as an extension. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Move `Count` to `functions-aggregate` [datafusion]

2024-06-04 Thread via GitHub
alamb commented on code in PR #10484: URL: https://github.com/apache/datafusion/pull/10484#discussion_r1626224232 ## datafusion-cli/Cargo.toml: ## @@ -26,7 +26,7 @@ license = "Apache-2.0" homepage = "https://datafusion.apache.org"; repository = "https://github.com/apache/dataf

Re: [PR] Move `Count` to `functions-aggregate` [datafusion]

2024-06-04 Thread via GitHub
alamb commented on code in PR #10484: URL: https://github.com/apache/datafusion/pull/10484#discussion_r1626228488 ## datafusion/physical-expr/src/aggregate/build_in.rs: ## @@ -61,14 +61,9 @@ pub fn create_aggregate_expr( .collect::>>()?; let input_phy_exprs = input

Re: [I] Extract parquet statistics from `Duration` columns [datafusion]

2024-06-04 Thread via GitHub
alamb commented on issue #10754: URL: https://github.com/apache/datafusion/issues/10754#issuecomment-2147853236 Thanks @marvinlanhenke -- I agree that since Duration can't be written to parquet we won't be able to extract statistics Thank you for double checking -- This is an aut

Re: [I] Extract parquet statistics from `Duration` columns [datafusion]

2024-06-04 Thread via GitHub
alamb closed issue #10754: Extract parquet statistics from `Duration` columns URL: https://github.com/apache/datafusion/issues/10754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Extract parquet statistics from `Interval` columns [datafusion]

2024-06-04 Thread via GitHub
alamb commented on issue #10752: URL: https://github.com/apache/datafusion/issues/10752#issuecomment-2147857707 > @alamb I guess we can't extract any statistics here? And writing any tests that check we have no statistics written, does not seem to be very helpful? I actually t

[PR] [Draft][DONNOT MERGE] Impl for HF_Store [datafusion]

2024-06-04 Thread via GitHub
xinlifoobar opened a new pull request, #10792: URL: https://github.com/apache/datafusion/pull/10792 ## Which issue does this PR close? Closes #10720 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [I] Implement `hf://` / "hugging face" integration in datafusion-cli [datafusion]

2024-06-04 Thread via GitHub
xinlifoobar commented on issue #10720: URL: https://github.com/apache/datafusion/issues/10720#issuecomment-2147880819 PR to reference: https://github.com/duckdb/duckdb/pull/11831/files#diff-e0d4fb8749355dd063169c27338bd119b7814546a06720ee1cd18caf83ad5106 -- This is an automated message fr

Re: [PR] minor: Refactor some unparser methods to improve readability [datafusion]

2024-06-04 Thread via GitHub
alamb merged PR #10788: URL: https://github.com/apache/datafusion/pull/10788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-06-04 Thread via GitHub
alamb commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1626296887 ## datafusion-examples/examples/advanced_udaf.rs: ## @@ -412,7 +412,7 @@ async fn main() -> Result<()> { let df = ctx.table("t").await?; // perform the a

Re: [PR] Convert variance sample to udaf [datafusion]

2024-06-04 Thread via GitHub
alamb commented on code in PR #10713: URL: https://github.com/apache/datafusion/pull/10713#discussion_r1626301925 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -31,8 +31,10 @@ use datafusion::datasource::TableProvider; use datafusion::execution::context::Sess

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626307313 ## dev/release/check-rat-report.py: ## @@ -0,0 +1,59 @@ +#!/usr/bin/python +## +# L

Re: [PR] feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing [datafusion-comet]

2024-06-04 Thread via GitHub
parthchandra commented on code in PR #514: URL: https://github.com/apache/datafusion-comet/pull/514#discussion_r1626312200 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2169,10 +2169,10 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSer

Re: [I] Row groups are read out of order or with completely different values [datafusion]

2024-06-04 Thread via GitHub
twitu commented on issue #10572: URL: https://github.com/apache/datafusion/issues/10572#issuecomment-2147977834 So I tried out some experiments with different combinations of using `ORDER BY` and setting the `repartition_file_scans` configuration value. And the results are a bit un-intuitiv

Re: [PR] feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on PR #514: URL: https://github.com/apache/datafusion-comet/pull/514#issuecomment-2147984447 @kazuyukitanimura @huaxingao Could I get a committer review please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] Add documentation on building with Spark 4 [datafusion-comet]

2024-06-04 Thread via GitHub
kazuyukitanimura commented on issue #515: URL: https://github.com/apache/datafusion-comet/issues/515#issuecomment-2148001206 Yes, I am on it. Thank you @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove merged PR #514: URL: https://github.com/apache/datafusion-comet/pull/514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Row groups are read out of order or with completely different values [datafusion]

2024-06-04 Thread via GitHub
twitu commented on issue #10572: URL: https://github.com/apache/datafusion/issues/10572#issuecomment-2148021428 Here's the result for the script that reads the whole file and counts the total number of rows. | order | repartition | wall time (s) | memory (mb) | read sorted order |

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-06-04 Thread via GitHub
kazuyukitanimura commented on PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#issuecomment-2148054852 Ops, looks like this is a merge conflict, I will fix it. Sorry for the inconvenience. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#issuecomment-2148086351 > Ops, looks like this is a merge conflict, I will fix it. Sorry for the inconvenience. @kazuyukitanimura @vaibhawvipul This is already fixed in main. I fixed it as part o

Re: [I] Feedback request for providing configurable UDF functions [datafusion]

2024-06-04 Thread via GitHub
andygrove commented on issue #10744: URL: https://github.com/apache/datafusion/issues/10744#issuecomment-2148095866 @Omega359 so far we have been implementing custom `PhysicalExpr` directly in the `datafusion-comet` project as needed for Spark-specific behavior, with support for Spark's dif

Re: [I] Circular relationship when determining state fields for AggregateUDF [datafusion]

2024-06-04 Thread via GitHub
viirya commented on issue #10785: URL: https://github.com/apache/datafusion/issues/10785#issuecomment-2148099529 I'm trying to fix this in Scala code. If it works, we don't need to change DataFusion. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Cast string to timestamp remaining work [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on issue #376: URL: https://github.com/apache/datafusion-comet/issues/376#issuecomment-2148101533 I see that multiple people are interested in working on this. I didn't see this until I was mentioned. @sinhs we can't assign issues to people unless they are committ

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-06-04 Thread via GitHub
kazuyukitanimura commented on PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#issuecomment-2148105329 Ah thank you @andygrove cc @planga82 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[I] Support OneRowRelation to Support Scalar Functions? [datafusion-comet]

2024-06-04 Thread via GitHub
tshauck opened a new issue, #516: URL: https://github.com/apache/datafusion-comet/issues/516 ### What is the problem the feature request solves? It's come up a couple of times recently that expressions need scalar tests, but it's not clear it's possible to have Comet execute something

[I] `cli_quick_test` failing on windows (stack overflow) after sqlparser `0.47.0` upgrade [datafusion]

2024-06-04 Thread via GitHub
alamb opened a new issue, #10793: URL: https://github.com/apache/datafusion/issues/10793 ### Describe the bug After https://github.com/apache/datafusion/pull/10392 the windows CI check fails like this: https://github.com/apache/datafusion/actions/runs/9332719422/job/25688898390?pr=10

Re: [PR] build(deps): upgrade sqlparser to 0.47.0 [datafusion]

2024-06-04 Thread via GitHub
alamb commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1626434201 ## .github/workflows/rust.yml: ## @@ -334,6 +334,8 @@ jobs: run: | export PATH=$PATH:$HOME/d/protoc/bin cargo test --lib --tests --bin

Re: [PR] build(deps): upgrade sqlparser to 0.47.0 [datafusion]

2024-06-04 Thread via GitHub
alamb commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1626434949 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -341,29 +341,8 @@ pub struct CreateFunctionBody { pub language: Option, /// IMMUTABLE | STABLE | VOLATILE

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626436179 ## dev/release/check-rat-report.py: ## @@ -0,0 +1,59 @@ +#!/usr/bin/python +## +# L

Re: [PR] Experiment: Coalesce batches after scan [datafusion-comet]

2024-06-04 Thread via GitHub
viirya commented on PR #496: URL: https://github.com/apache/datafusion-comet/pull/496#issuecomment-2148138531 I've checked the implementation of `CoalesceBatchesExec`. It actually buffers produced batches from its upstream. Comet scan reuses vectors when producing batches, so when you poll

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626439747 ## dev/release/check-rat-report.py: ## @@ -0,0 +1,59 @@ +#!/usr/bin/python +## +# L

Re: [I] Feedback request for providing configurable UDF functions [datafusion]

2024-06-04 Thread via GitHub
Omega359 commented on issue #10744: URL: https://github.com/apache/datafusion/issues/10744#issuecomment-2148144619 > I think we need to have the discussion of whether it makes sense to upstream these into the core datafusion project or not, or whether we publish a `datafusion-spark-compat`

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
comphead commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626443389 ## dev/release/check-rat-report.py: ## @@ -0,0 +1,59 @@ +#!/usr/bin/python +## +# Li

Re: [I] Circular relationship when determining state fields for AggregateUDF [datafusion]

2024-06-04 Thread via GitHub
alamb commented on issue #10785: URL: https://github.com/apache/datafusion/issues/10785#issuecomment-2148200482 FYI @jayzhan211 who I think was working on similar issues / untangling -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Extract parquet statistics from `Interval` columns [datafusion]

2024-06-04 Thread via GitHub
marvinlanhenke commented on issue #10752: URL: https://github.com/apache/datafusion/issues/10752#issuecomment-2148202795 sure I can do that; from the top of my mind - the `fn run()` from the `struct Test` panics if we can't extract any statistics, which is the case here. So I'd prepare as m

[PR] Split `SessionState` into its own module [datafusion]

2024-06-04 Thread via GitHub
alamb opened a new pull request, #10794: URL: https://github.com/apache/datafusion/pull/10794 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10782 ## Rationale for this change the `datafusion/core/src/execution/context/mod.rs` file

Re: [I] Add documentation on building with Spark 4 [datafusion-comet]

2024-06-04 Thread via GitHub
kazuyukitanimura commented on issue #515: URL: https://github.com/apache/datafusion-comet/issues/515#issuecomment-2148303823 JDK 17 requirement of Spark could be one of the reasons -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-04 Thread via GitHub
kazuyukitanimura commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1626553709 ## dev/release/verify-release-candidate.sh: ## @@ -0,0 +1,132 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [I] Add documentation on building with Spark 4 [datafusion-comet]

2024-06-04 Thread via GitHub
andygrove commented on issue #515: URL: https://github.com/apache/datafusion-comet/issues/515#issuecomment-2148345201 Switching to JDK resolved my issue. I will leave this issue open until we have documentation for building with Spark 4 -- This is an automated message from the Apache Git

  1   2   >