Re: [PR] fix: preserve qualifiers when rewriting expressions [datafusion]

2024-09-06 Thread via GitHub
Dandandan merged PR #12341: URL: https://github.com/apache/datafusion/pull/12341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Support Utf8View and BinaryView in substrait serialization. [datafusion]

2024-09-06 Thread via GitHub
alamb merged PR #12199: URL: https://github.com/apache/datafusion/pull/12199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support substrait serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` [datafusion]

2024-09-06 Thread via GitHub
alamb closed issue #12118: Support substrait serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` URL: https://github.com/apache/datafusion/issues/12118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Support Utf8View and BinaryView in substrait serialization. [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12199: URL: https://github.com/apache/datafusion/pull/12199#issuecomment-2333824256 Thanks again @wiedld and @Blizzara -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Support Utf8View and BinaryView in substrait serialization. [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12199: URL: https://github.com/apache/datafusion/pull/12199#discussion_r1746945928 ## datafusion/substrait/src/variation_const.rs: ## @@ -52,6 +52,7 @@ pub const DATE_32_TYPE_VARIATION_REF: u32 = 0; pub const DATE_64_TYPE_VARIATION_REF: u32 = 1;

Re: [PR] Prototype implementing DataFusion functions / operators using arrow-udf liibrary [datafusion]

2024-09-06 Thread via GitHub
Xuanwo commented on PR #11488: URL: https://github.com/apache/datafusion/pull/11488#issuecomment-2333861235 Hello, everyone. Exciting news: arrow-udf now belongs to a new organization called [`arrow-udf`](https://github.com/arrow-udf). Please let me know if there is anything preventing us f

[PR] Update Aggregate functions to take builder parameters [datafusion-python]

2024-09-06 Thread via GitHub
timsaucer opened a new pull request, #859: URL: https://github.com/apache/datafusion-python/pull/859 # Which issue does this PR close? Closes #780 # Rationale for this change This PR follows the same pattern as the recently closed #808 but does the same for aggregate f

Re: [PR] feat: Added DataFrameWriteOptions option when writing as csv, json, p… [datafusion-python]

2024-09-06 Thread via GitHub
timsaucer commented on code in PR #857: URL: https://github.com/apache/datafusion-python/pull/857#discussion_r1747044279 ## python/datafusion/dataframe.py: ## @@ -409,37 +409,62 @@ def except_all(self, other: DataFrame) -> DataFrame: """ return DataFrame(self.d

Re: [PR] feat: Add projection to FilterExec [datafusion]

2024-09-06 Thread via GitHub
eejbyfeldt commented on PR #12281: URL: https://github.com/apache/datafusion/pull/12281#issuecomment-2334003208 > It seems like the `with_projection()` API could end up being a method of the `ExecutionPlan` in the long run, wdyt? (It would improve the performance of all operators which inte

Re: [PR] feat: Add projection to FilterExec [datafusion]

2024-09-06 Thread via GitHub
eejbyfeldt commented on code in PR #12281: URL: https://github.com/apache/datafusion/pull/12281#discussion_r1747092400 ## datafusion/physical-expr/src/equivalence/projection.rs: ## @@ -82,6 +82,11 @@ impl ProjectionMapping { .map(|map| Self { map }) } +pu

[I] [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) [datafusion]

2024-09-06 Thread via GitHub
alamb opened a new issue, #12357: URL: https://github.com/apache/datafusion/issues/12357 ### Is your feature request related to a problem or challenge? DataFuson is growing by almost all measures: community, features, and codebase size which is good. However, this growth is ca

Re: [PR] fix: preserve qualifiers when rewriting expressions [datafusion]

2024-09-06 Thread via GitHub
jonahgao commented on PR #12341: URL: https://github.com/apache/datafusion/pull/12341#issuecomment-2334131028 Thanks @JasonLi-cn @alamb @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Bug triggered by special aliases in nested queries [datafusion]

2024-09-06 Thread via GitHub
jonahgao closed issue #12183: Bug triggered by special aliases in nested queries URL: https://github.com/apache/datafusion/issues/12183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] List available functions (`SHOW FUNCTIONS`) [datafusion]

2024-09-06 Thread via GitHub
alamb commented on issue #12144: URL: https://github.com/apache/datafusion/issues/12144#issuecomment-2334139425 We could add a `datafusion_functions` table function in datafusion-cli or other downstream implementation perhaps -- This is an automated message from the Apache Git Service. To

Re: [PR] feat: date_add function [datafusion-comet]

2024-09-06 Thread via GitHub
andygrove commented on code in PR #910: URL: https://github.com/apache/datafusion-comet/pull/910#discussion_r1747185395 ## native/spark-expr/src/scalar_funcs.rs: ## @@ -547,3 +551,32 @@ pub fn spark_isnan(args: &[ColumnarValue]) -> Result Result { +let start = &args[0]; +

Re: [I] Congestion Scenario in SPM [datafusion]

2024-09-06 Thread via GitHub
alamb closed issue #12300: Congestion Scenario in SPM URL: https://github.com/apache/datafusion/issues/12300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] Fix Possible Congestion Scenario in `SortPreservingMergeExec` [datafusion]

2024-09-06 Thread via GitHub
alamb merged PR #12302: URL: https://github.com/apache/datafusion/pull/12302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support protobuf encoding and decoding of `UnnestExec` [datafusion]

2024-09-06 Thread via GitHub
alamb merged PR #12344: URL: https://github.com/apache/datafusion/pull/12344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Missing serde for UnnestExec physical plan node [datafusion]

2024-09-06 Thread via GitHub
alamb closed issue #12343: Missing serde for UnnestExec physical plan node URL: https://github.com/apache/datafusion/issues/12343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: date_add function [datafusion-comet]

2024-09-06 Thread via GitHub
andygrove commented on code in PR #910: URL: https://github.com/apache/datafusion-comet/pull/910#discussion_r1747208959 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -146,6 +146,32 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] feat: Add projection to FilterExec [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12281: URL: https://github.com/apache/datafusion/pull/12281#issuecomment-2334184948 šŸš€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Add projection to `FilterExec` to avoid unecessary output creation [datafusion]

2024-09-06 Thread via GitHub
alamb closed issue #5436: Add projection to `FilterExec` to avoid unecessary output creation URL: https://github.com/apache/datafusion/issues/5436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [Epic] A collection of issues for extending the Aggregation function [datafusion]

2024-09-06 Thread via GitHub
Weijun-H commented on issue #12254: URL: https://github.com/apache/datafusion/issues/12254#issuecomment-2334206044 > I wonder if we should consider where to draw the line on what aggregate functions to include in the core (i.e. should we include all these new functions?) > > Now that

Re: [PR] Added array_any_value function [datafusion]

2024-09-06 Thread via GitHub
Weijun-H commented on PR #12329: URL: https://github.com/apache/datafusion/pull/12329#issuecomment-2334214291 @jayzhan211 could you have time to review this pr? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Remove special casting of `Min` / `Max` built in `AggregateFunctions` [datafusion]

2024-09-06 Thread via GitHub
alamb commented on issue #11151: URL: https://github.com/apache/datafusion/issues/11151#issuecomment-2334240589 Here is one specific suggestion: https://github.com/apache/datafusion/pull/12296#discussion_r1747254563 -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Support for SIMILAR TO for physical plan [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12350: URL: https://github.com/apache/datafusion/pull/12350#issuecomment-2334244259 > There is an existing like module for the LIKE operator. Probably, this functionality can be extracted to a similar similar_to module. Perhaps you could do this as a fol

Re: [I] Enhance Expr.cast() to accept python types [datafusion-python]

2024-09-06 Thread via GitHub
andygrove closed issue #753: Enhance Expr.cast() to accept python types URL: https://github.com/apache/datafusion-python/issues/753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] feat: make cast accept built-in Python types [datafusion-python]

2024-09-06 Thread via GitHub
andygrove merged PR #858: URL: https://github.com/apache/datafusion-python/pull/858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] chore: fix docstrings, typos [datafusion-python]

2024-09-06 Thread via GitHub
andygrove merged PR #852: URL: https://github.com/apache/datafusion-python/pull/852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Faster `character_length()` string function for ASCII-only case [datafusion]

2024-09-06 Thread via GitHub
comphead commented on code in PR #12356: URL: https://github.com/apache/datafusion/pull/12356#discussion_r1747371020 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -99,18 +99,30 @@ fn character_length(args: &[ArrayRef]) -> Result { } } -fn character_lengt

Re: [PR] Faster `character_length()` string function for ASCII-only case [datafusion]

2024-09-06 Thread via GitHub
comphead commented on code in PR #12356: URL: https://github.com/apache/datafusion/pull/12356#discussion_r1747380475 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -99,18 +99,30 @@ fn character_length(args: &[ArrayRef]) -> Result { } } -fn character_lengt

Re: [I] [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) [datafusion]

2024-09-06 Thread via GitHub
cisaacson commented on issue #12357: URL: https://github.com/apache/datafusion/issues/12357#issuecomment-2334395011 @alamb I fully agree with your recommendation. It maintains the power of DataFusion while avoiding too much complexity. In my mind (and I think the project), DataFusion is fir

Re: [PR] validate and adjust Substrait NamedTable schemas (#12223) [datafusion]

2024-09-06 Thread via GitHub
Blizzara commented on code in PR #12245: URL: https://github.com/apache/datafusion/pull/12245#discussion_r1747397923 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -657,7 +662,13 @@ pub async fn from_substrait_rel( table: nt.names[2].clone()

Re: [PR] feat: date_add function [datafusion-comet]

2024-09-06 Thread via GitHub
mbutrovich commented on PR #910: URL: https://github.com/apache/datafusion-comet/pull/910#issuecomment-2334414715 q72 comparison (only SF 1 locally), last two rows of each (using Comet Exec) are relevant. main branch: ``` TPCDS Snappy: Best Time(ms) Avg Ti

Re: [PR] Feat: Implement hf:// / "hugging face" integration in datafusion-cli [datafusion]

2024-09-06 Thread via GitHub
Xuanwo commented on PR #10792: URL: https://github.com/apache/datafusion/pull/10792#issuecomment-2334458674 Apologies for missing this PR. I wanted to share that [OpenDAL](https://github.com/apache/opendal) has native support for [Huggingface](https://docs.rs/opendal/latest/opendal/services

[PR] perf: Add native metric for time spent casting in native scan [datafusion-comet]

2024-09-06 Thread via GitHub
andygrove opened a new pull request, #919: URL: https://github.com/apache/datafusion-comet/pull/919 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

[I] Flaky fuzz tests for filtered outer SortMergeJoin [datafusion]

2024-09-06 Thread via GitHub
korowa opened a new issue, #12359: URL: https://github.com/apache/datafusion/issues/12359 ### Describe the bug After replacing the filter for join fuzz tests with the selective one (that doesn't return 100% of input rows), it turned out that following tests may periodically fail for

Re: [I] Incorrect behavior of arithmetic operations between time values [datafusion]

2024-09-06 Thread via GitHub
Abdullahsab3 commented on issue #12190: URL: https://github.com/apache/datafusion/issues/12190#issuecomment-2334563835 I think https://github.com/sqlparser-rs/sqlparser-rs/pull/1398 fixed this issue. -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] tests: enable fuzz for filtered anti-semi NLJoin [datafusion]

2024-09-06 Thread via GitHub
korowa opened a new pull request, #12360: URL: https://github.com/apache/datafusion/pull/12360 ## Which issue does this PR close? Closes #11537. ## Rationale for this change It turned out to be not a NLJoin issue, but fuzz tests -- during NLjoin construct

Re: [I] Casting existing timestamp to timestamp again strips timezone information [datafusion]

2024-09-06 Thread via GitHub
findepi commented on issue #12218: URL: https://github.com/apache/datafusion/issues/12218#issuecomment-2334610904 @devanbenz i think so, yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Fix issue with "to_date" failing to process dates later than year 2262 [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12227: URL: https://github.com/apache/datafusion/pull/12227#discussion_r1747557215 ## datafusion/functions/src/datetime/to_date.rs: ## @@ -118,3 +118,212 @@ impl ScalarUDFImpl for ToDateFunc { } } } + +#[cfg(test)] +mod tests { +

Re: [PR] Fix issue with "to_date" failing to process dates later than year 2262 [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12227: URL: https://github.com/apache/datafusion/pull/12227#discussion_r1747559413 ## datafusion/functions/src/datetime/to_date.rs: ## @@ -118,3 +118,212 @@ impl ScalarUDFImpl for ToDateFunc { } } } + +#[cfg(test)] +mod tests { +

Re: [PR] feat: Add projection to FilterExec [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12281: URL: https://github.com/apache/datafusion/pull/12281#issuecomment-2334643904 FWIW I ran clickbench and also saw improvement on Q30 and Q31 ``` Benchmark clickbench_1.json ā”ā”ā”ā”³ā”³ā”

Re: [PR] Fix issue with "to_date" failing to process dates later than year 2262 [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12227: URL: https://github.com/apache/datafusion/pull/12227#issuecomment-2334662890 I believe the CI failure https://github.com/apache/datafusion/actions/runs/10725200169/job/29798407173?pr=12227 was fixed on main by @mbrobbel in https://github.com/apache/datafusio

Re: [I] Following the memory management semantics stated in the Arrow C Data Interface Specification [datafusion-comet]

2024-09-06 Thread via GitHub
viirya closed issue #885: Following the memory management semantics stated in the Arrow C Data Interface Specification URL: https://github.com/apache/datafusion-comet/issues/885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] chore: Revise array import to more follow C Data Interface semantics [datafusion-comet]

2024-09-06 Thread via GitHub
viirya commented on PR #905: URL: https://github.com/apache/datafusion-comet/pull/905#issuecomment-2334667342 Merged. Thanks @andygrove @Kontinuation @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] Minor: Add tests for using FilterExec when parquet was pushed down [datafusion]

2024-09-06 Thread via GitHub
alamb opened a new pull request, #12362: URL: https://github.com/apache/datafusion/pull/12362 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/4028 ## Rationale for this change While reviewing https://github.com/apache/datafusion/pull

Re: [I] Use `StringViewArray` as output of `substr` when input was `StringArray` [datafusion]

2024-09-06 Thread via GitHub
Omega359 commented on issue #12338: URL: https://github.com/apache/datafusion/issues/12338#issuecomment-2334694805 I would propose that this change when made happens only after https://github.com/apache/datafusion/issues/12119 lands. -- This is an automated message from the Apache Git Ser

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12135: URL: https://github.com/apache/datafusion/pull/12135#discussion_r1747606086 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -826,27 +829,37 @@ impl TableProvider for ListingTable { &self, filters: &[&Expr], )

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12135: URL: https://github.com/apache/datafusion/pull/12135#discussion_r1747635751 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -138,6 +139,33 @@ pub trait FileFormat: Send + Sync + fmt::Debug { ) -> Result> { not_impl_

Re: [PR] validate and adjust Substrait NamedTable schemas (#12223) [datafusion]

2024-09-06 Thread via GitHub
Blizzara commented on PR #12245: URL: https://github.com/apache/datafusion/pull/12245#issuecomment-2334710029 I think this is good by me - @alamb would you (or someone else) be able to do the official review, please? :) Only note I have is that I think this change makes Substrait cons

Re: [PR] include input fields as output for Substrait consumer [datafusion]

2024-09-06 Thread via GitHub
Blizzara commented on code in PR #12225: URL: https://github.com/apache/datafusion/pull/12225#discussion_r1747658283 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -412,6 +428,10 @@ pub async fn from_substrait_rel( ); let mut names:

Re: [PR] chore: Revise array import to more follow C Data Interface semantics [datafusion-comet]

2024-09-06 Thread via GitHub
viirya commented on PR #905: URL: https://github.com/apache/datafusion-comet/pull/905#issuecomment-2334740509 Ah, I found I forgot to commit the patch addressing latest reviews https://github.com/apache/datafusion-comet/pull/905#discussion_r1746350944 and https://github.com/apache/datafusio

Re: [PR] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12135: URL: https://github.com/apache/datafusion/pull/12135#issuecomment-2334746185 šŸ¤” sorry to flip flop on this, but I wrote some more tests in https://github.com/apache/datafusion/pull/12362 and when I run them on this branch I see an internal error To repr

Re: [PR] feat: date_add and date_sub functions [datafusion-comet]

2024-09-06 Thread via GitHub
viirya commented on code in PR #910: URL: https://github.com/apache/datafusion-comet/pull/910#discussion_r1747670792 ## native/spark-expr/src/scalar_funcs.rs: ## @@ -547,3 +551,40 @@ pub fn spark_isnan(args: &[ColumnarValue]) -> Result Result, +) -> Result { +let start = &a

Re: [PR] chore: Address reviews [datafusion-comet]

2024-09-06 Thread via GitHub
viirya commented on PR #920: URL: https://github.com/apache/datafusion-comet/pull/920#issuecomment-2334765039 cc @Kontinuation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Remove deprecated ScalarValue::get_datatype [datafusion]

2024-09-06 Thread via GitHub
alamb merged PR #12361: URL: https://github.com/apache/datafusion/pull/12361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve StringView support for SUBSTR [datafusion]

2024-09-06 Thread via GitHub
alamb merged PR #12044: URL: https://github.com/apache/datafusion/pull/12044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Improve performance of SUBSTR for StringViewArray [datafusion]

2024-09-06 Thread via GitHub
alamb closed issue #12031: Improve performance of SUBSTR for StringViewArray URL: https://github.com/apache/datafusion/issues/12031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Fix issue with "to_date" failing to process dates later than year 2262 [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12227: URL: https://github.com/apache/datafusion/pull/12227#issuecomment-2334778822 We can consider potential changes to make the test more "readable" as part of a follow on PR. Thanks agian @MartinKolbAtWork and @findepi -- This is an automated message from

Re: [I] "to_date" fails to process dates later than year 2262 [datafusion]

2024-09-06 Thread via GitHub
alamb closed issue #12226: "to_date" fails to process dates later than year 2262 URL: https://github.com/apache/datafusion/issues/12226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Access a `Map` with Primitive type keys [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12259: URL: https://github.com/apache/datafusion/pull/12259#issuecomment-2334783943 I am trying to clean up the review queue, so marking this PR as draft as the CI tests are failing. Let me know if that wasn't right -- This is an automated message from the Apache Gi

Re: [PR] include input fields as output for Substrait consumer [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12225: URL: https://github.com/apache/datafusion/pull/12225#issuecomment-2334784305 I am trying to clean up the review queue, so marking this PR as draft as the CI tests are failing. Let me know if that wasn't right -- This is an automated message from the Apache Gi

Re: [PR] include input fields as output for Substrait consumer [datafusion]

2024-09-06 Thread via GitHub
vbarua commented on PR #12225: URL: https://github.com/apache/datafusion/pull/12225#issuecomment-2334812727 I have some work in progress around remaps for https://github.com/apache/datafusion/issues/12347 which I suspect will overlap with this. @Lordworms I'd be happy to pull your changes i

[PR] doc: Update native code path in development [datafusion-comet]

2024-09-06 Thread via GitHub
viirya opened a new pull request, #921: URL: https://github.com/apache/datafusion-comet/pull/921 ## Which issue does this PR close? Closes #. ## Rationale for this change We moved native code but haven't updated the path in the doc. ## What changes

Re: [PR] Minor: improve performance of `ScalarValue::Binary*` debug [datafusion]

2024-09-06 Thread via GitHub
alamb merged PR #12323: URL: https://github.com/apache/datafusion/pull/12323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: date_add and date_sub functions [datafusion-comet]

2024-09-06 Thread via GitHub
mbutrovich commented on code in PR #910: URL: https://github.com/apache/datafusion-comet/pull/910#discussion_r1747736596 ## native/spark-expr/src/scalar_funcs.rs: ## @@ -547,3 +551,40 @@ pub fn spark_isnan(args: &[ColumnarValue]) -> Result Result, +) -> Result { +let start

Re: [PR] Implement native support StringView for `CONTAINS` function [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12168: URL: https://github.com/apache/datafusion/pull/12168#discussion_r1747741481 ## datafusion/functions/Cargo.toml: ## @@ -52,9 +52,9 @@ encoding_expressions = ["base64", "hex"] # enable math functions math_expressions = [] # enable regular e

Re: [PR] Minor: Support protobuf serialization for Utf8View and BinaryView [datafusion]

2024-09-06 Thread via GitHub
Lordworms commented on PR #12165: URL: https://github.com/apache/datafusion/pull/12165#issuecomment-2334845189 > Thank you @Lordworms -- I am sorry for the delay in reviewing this PR. I found one bug (see below) but clearly there was a gap in coverage. > > Thus I took the liberty of f

Re: [PR] Add support for Utf8View, Boolean, Date32/64, int32/64 for writing hive style partitions [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12283: URL: https://github.com/apache/datafusion/pull/12283#discussion_r1747767088 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -320,9 +324,11 @@ async fn hive_style_partitions_demuxer( fn compute_partition_keys_by_row<'a>(

Re: [PR] fix: support Substrait VirtualTables with no columns [datafusion]

2024-09-06 Thread via GitHub
alamb merged PR #12339: URL: https://github.com/apache/datafusion/pull/12339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add support for Utf8View, Boolean, Date32/64, int32/64 for writing hive style partitions [datafusion]

2024-09-06 Thread via GitHub
alamb commented on code in PR #12283: URL: https://github.com/apache/datafusion/pull/12283#discussion_r1747789029 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -320,9 +324,11 @@ async fn hive_style_partitions_demuxer( fn compute_partition_keys_by_row<'a>(

Re: [PR] Remove unnecessary `Result` from return type in `NamePreserver` [datafusion]

2024-09-06 Thread via GitHub
alamb commented on PR #12358: URL: https://github.com/apache/datafusion/pull/12358#issuecomment-2334909305 > Makes sense to me. > > Should this be tagged with api change since technically NamePreserver and SavedName are technically exposed in the public API? I think you are rig

[PR] Upstream merge [datafusion]

2024-09-06 Thread via GitHub
ameyc opened a new pull request, #12364: URL: https://github.com/apache/datafusion/pull/12364 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

[PR] add guidelines on separating python and rust code [datafusion-python]

2024-09-06 Thread via GitHub
Michael-J-Ward opened a new pull request, #860: URL: https://github.com/apache/datafusion-python/pull/860 # Which issue does this PR close? Closes #779. # Rationale for this change The introduction of the `python` wrappers necessitates some guideline for separating code.

Re: [PR] Update Aggregate functions to take builder parameters [datafusion-python]

2024-09-06 Thread via GitHub
Michael-J-Ward commented on PR #859: URL: https://github.com/apache/datafusion-python/pull/859#issuecomment-2334915113 @timsaucer - If there's a way for me to help with this lift without stepping on your toes, please let me know. -- This is an automated message from the Apache Git Servic

Re: [PR] Update Aggregate functions to take builder parameters [datafusion-python]

2024-09-06 Thread via GitHub
timsaucer commented on PR #859: URL: https://github.com/apache/datafusion-python/pull/859#issuecomment-2334945189 If you wanted to divide and conquer we can, but actually I think another thing that would be very helpful would be to have a more ergonomic way to use aggregates as window func

Re: [PR] doc: Update native code path in development [datafusion-comet]

2024-09-06 Thread via GitHub
viirya commented on PR #921: URL: https://github.com/apache/datafusion-comet/pull/921#issuecomment-2334953869 Thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore: Address reviews [datafusion-comet]

2024-09-06 Thread via GitHub
viirya merged PR #920: URL: https://github.com/apache/datafusion-comet/pull/920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Implement native support StringView for `CONTAINS` function [datafusion]

2024-09-06 Thread via GitHub
tlm365 commented on code in PR #12168: URL: https://github.com/apache/datafusion/pull/12168#discussion_r1747837674 ## datafusion/functions/Cargo.toml: ## @@ -52,9 +52,9 @@ encoding_expressions = ["base64", "hex"] # enable math functions math_expressions = [] # enable regular

Re: [PR] chore: Use datafusion re-exported dependencies [datafusion-python]

2024-09-06 Thread via GitHub
timsaucer merged PR #856: URL: https://github.com/apache/datafusion-python/pull/856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Faster `character_length()` string function for ASCII-only case [datafusion]

2024-09-06 Thread via GitHub
2010YOUY01 commented on code in PR #12356: URL: https://github.com/apache/datafusion/pull/12356#discussion_r1747881020 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -99,18 +99,30 @@ fn character_length(args: &[ArrayRef]) -> Result { } } -fn character_len

[I] Optimize `substr()` string function with ASCII fast path [datafusion]

2024-09-06 Thread via GitHub
2010YOUY01 opened a new issue, #12367: URL: https://github.com/apache/datafusion/issues/12367 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/12306 https://github.com/apache/datafusion/issues/12306 has introduced

Re: [I] [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) [datafusion]

2024-09-06 Thread via GitHub
jayzhan211 commented on issue #12357: URL: https://github.com/apache/datafusion/issues/12357#issuecomment-2335044470 I think the reason why DuckDB is also taken into consideration is that when we start the array function, we found that OLAP style db is a much more suitable choice to follow