Re: [I] Feature: support `to_char(date, timstamp format)` [datafusion]

2025-02-15 Thread via GitHub
Omega359 commented on issue #14536: URL: https://github.com/apache/datafusion/issues/14536#issuecomment-2661094974 I took a look into this. DataFusion delegates the formatting off to arrow-cast which doesn't currently have support for trying multiple times with different types. I se

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-15 Thread via GitHub
comphead commented on PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#issuecomment-2661096890 > oops, @comphead would you mind merging the latest main into this PR branch in order to resolve the conflict? Resolved, please have another look -- This is an automat

Re: [I] Feature: support Timestamp with TZ for function `to_unixtime` [datafusion]

2025-02-15 Thread via GitHub
Omega359 commented on issue #14659: URL: https://github.com/apache/datafusion/issues/14659#issuecomment-2661097340 This works on main: ``` query I select to_unixtime(arrow_cast('2020-01-01 00:10:20.123'::timestamp, 'Timestamp(Second, Some("America/New_York"))')); 157

Re: [I] Feature: support Timestamp with TZ for function `to_unixtime` [datafusion]

2025-02-15 Thread via GitHub
Omega359 commented on issue #14659: URL: https://github.com/apache/datafusion/issues/14659#issuecomment-2661097723 I think I fixed this earlier this month in #14490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[I] Failure when fetching shuffle partitions due to exceeding default gRPC message size [datafusion-ballista]

2025-02-15 Thread via GitHub
andygrove opened a new issue, #1182: URL: https://github.com/apache/datafusion-ballista/issues/1182 **Describe the bug** Benchmark client: ``` thread 'main' panicked at benchmarks/src/bin/tpch.rs:403:18: called `Result::unwrap()` on an `Err` value: Plan("ArrowError(Extern

Re: [PR] Simple Functions Preview [datafusion]

2025-02-15 Thread via GitHub
jayzhan211 commented on PR #14668: URL: https://github.com/apache/datafusion/pull/14668#issuecomment-2661187751 ```rust #[excalibur_function] fn add(a: i32, b: u32) -> i64 { a as i64 + b as i64 } ``` Is it possible to define a function using generics? Otherwise, will

Re: [PR] chore Remove hard-coded Comet version from CI [datafusion-comet]

2025-02-15 Thread via GitHub
codecov-commenter commented on PR #1408: URL: https://github.com/apache/datafusion-comet/pull/1408#issuecomment-2661007098 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1408?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957155614 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -302,31 +299,16 @@ impl ExternalSorter { } self.reserve_memory_for_merge()?; -let si

Re: [I] Memory account not adding up in SortExec [datafusion]

2025-02-15 Thread via GitHub
alamb commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2661008411 @Kontinuation has a PR with a proposed fix: - https://github.com/apache/datafusion/pull/14644/ -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Support SQL pipe operator [datafusion]

2025-02-15 Thread via GitHub
alamb commented on issue #14660: URL: https://github.com/apache/datafusion/issues/14660#issuecomment-2661009274 Interesting idea -- the first thing needed is support in sqlparser if we don't alread have it Thanks for the suggestion -- This is an automated message from the Apache G

Re: [PR] Consolidate and expand ident normalization tests [datafusion]

2025-02-15 Thread via GitHub
alamb commented on PR #14374: URL: https://github.com/apache/datafusion/pull/14374#issuecomment-2661018664 Thanks @xudong963 -- no worries! thank you for reviewing this PR ๐Ÿ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-15 Thread via GitHub
djanderson commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2661181898 Alright so I was able to strip down a reproducer for the JoinError panic but it's a little unsatisfying. I absolutely tried to start very simple, but - I was unable to trigger t

Re: [PR] [POC] Try to plan ast::Expr::CompoundFieldAccess syntax [datafusion]

2025-02-15 Thread via GitHub
github-actions[bot] commented on PR #13734: URL: https://github.com/apache/datafusion/pull/13734#issuecomment-2661189606 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2025-02-15 Thread via GitHub
jayzhan211 commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2661197895 @tobixdev I found LogicalScalar to be a largely duplicated concept with only minor differences from ScalarValue, mainly aimed at reducing complexity. However, the red

Re: [PR] chore: Disable k8s CI checks [datafusion-ray]

2025-02-15 Thread via GitHub
andygrove merged PR #61: URL: https://github.com/apache/datafusion-ray/pull/61 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] Feature: support cast `date` to `timestamp` with tz [datafusion]

2025-02-15 Thread via GitHub
friendlymatthew commented on issue #14638: URL: https://github.com/apache/datafusion/issues/14638#issuecomment-2661135828 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[PR] Speedup `to_hex` (~2x faster) [datafusion]

2025-02-15 Thread via GitHub
simonvandel opened a new pull request, #14686: URL: https://github.com/apache/datafusion/pull/14686 ## Which issue does this PR close? N/A ## Rationale for this change We can speedup `to_hex` by writing string values directly to the string array, instead of makin

Re: [PR] minor: simplify `union_extract` code [datafusion]

2025-02-15 Thread via GitHub
gstvg commented on code in PR #14640: URL: https://github.com/apache/datafusion/pull/14640#discussion_r1957208302 ## datafusion/functions/src/core/union_extract.rs: ## @@ -113,22 +114,15 @@ impl ScalarUDFImpl for UnionExtractFun { } fn invoke_with_args(&self, args: S

Re: [I] Test DataFusion 45.0.0 with Comet [datafusion]

2025-02-15 Thread via GitHub
andygrove commented on issue #14274: URL: https://github.com/apache/datafusion/issues/14274#issuecomment-2661146980 Yes, thanks for the reminder -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] minor: simplify `union_extract` code [datafusion]

2025-02-15 Thread via GitHub
gstvg commented on code in PR #14640: URL: https://github.com/apache/datafusion/pull/14640#discussion_r1957208327 ## datafusion/functions/src/core/union_extract.rs: ## @@ -140,19 +134,16 @@ impl ScalarUDFImpl for UnionExtractFun { Ok(ColumnarValue::Array(

Re: [PR] Add union_extract scalar function [datafusion]

2025-02-15 Thread via GitHub
gstvg commented on PR #12116: URL: https://github.com/apache/datafusion/pull/12116#issuecomment-2661146353 Many thanks @jayzhan211 @samuelcolvin @Omega359 @alamb :pray: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Test DataFusion 45.0.0 with Comet [datafusion]

2025-02-15 Thread via GitHub
andygrove closed issue #14274: Test DataFusion 45.0.0 with Comet URL: https://github.com/apache/datafusion/issues/14274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] refactor: collect dataframe as stream in `__repr__` [datafusion-python]

2025-02-15 Thread via GitHub
timsaucer commented on code in PR #1015: URL: https://github.com/apache/datafusion-python/pull/1015#discussion_r1957173874 ## src/dataframe.rs: ## @@ -90,8 +91,16 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult { -let df = self.d

Re: [I] multiply overflow in stats.rs [datafusion]

2025-02-15 Thread via GitHub
Omega359 commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2661050198 I don't think there was a commit where I saw it - it happened during my local dev of the sqlite testing. -- This is an automated message from the Apache Git Service. To respo

[PR] Add union_tag scalar function [datafusion]

2025-02-15 Thread via GitHub
gstvg opened a new pull request, #14687: URL: https://github.com/apache/datafusion/pull/14687 ## Which issue does this PR close? - Closes #11080 ## Rationale for this change Retrieve the name of the currently selected field on a union, as there's no way to do it today

Re: [PR] Consolidate feature flags into configuration guide [datafusion]

2025-02-15 Thread via GitHub
dhegberg commented on code in PR #14657: URL: https://github.com/apache/datafusion/pull/14657#discussion_r1957220429 ## docs/source/user-guide/crate-configuration.md: ## @@ -25,7 +25,47 @@ control DataFusion's behavior. [configuration settings]: configs.md -## Add latest no

Re: [I] Feature request: hermetic build [datafusion]

2025-02-15 Thread via GitHub
dentiny commented on issue #14678: URL: https://github.com/apache/datafusion/issues/14678#issuecomment-2661298450 Hi @alamb thanks for the reply! I made this issue for two reasons: - hermetic build itself provides a better way to build from a stable environment - it reduces confusion f

[PR] docs: Add instruction to build [datafusion]

2025-02-15 Thread via GitHub
dentiny opened a new pull request, #14694: URL: https://github.com/apache/datafusion/pull/14694 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/14681 ## Rationale for this change Add better instruction for new-comers to build the pr

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-15 Thread via GitHub
dentiny commented on code in PR #14694: URL: https://github.com/apache/datafusion/pull/14694#discussion_r1957260006 ## docs/source/contributor-guide/index.md: ## @@ -216,3 +216,23 @@ The good thing about open code and open development is that any issues in one ch Pull reques

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-15 Thread via GitHub
2010YOUY01 commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957240155 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -408,50 +395,100 @@ impl ExternalSorter { debug!("Spilling sort data of ExternalSorter to disk w

[PR] `AggregateUDFImpl::schema_name` for customizable schema name [datafusion]

2025-02-15 Thread via GitHub
jayzhan211 opened a new pull request, #14695: URL: https://github.com/apache/datafusion/pull/14695 ## Which issue does this PR close? - Closes #. ## Rationale for this change Similar to ScalarFunction, we can modify the schema name to the one we want. We need t

Re: [PR] Update GitHub CI run image for license check job [datafusion]

2025-02-15 Thread via GitHub
alamb merged PR #14674: URL: https://github.com/apache/datafusion/pull/14674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Implement shuffle using Ray object store [datafusion-ray]

2025-02-15 Thread via GitHub
andygrove closed issue #55: Implement shuffle using Ray object store URL: https://github.com/apache/datafusion-ray/issues/55 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] chore: Disable k8s CI checks [datafusion-ray]

2025-02-15 Thread via GitHub
andygrove opened a new pull request, #61: URL: https://github.com/apache/datafusion-ray/pull/61 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Test DataFusion 45.0.0 with Comet [datafusion]

2025-02-15 Thread via GitHub
simonvandel commented on issue #14274: URL: https://github.com/apache/datafusion/issues/14274#issuecomment-2661142985 @andygrove can this be closed given DF 45 is already released? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] perf: Drop RowConverter from GroupOrderingPartial [datafusion]

2025-02-15 Thread via GitHub
ctsk commented on code in PR #14566: URL: https://github.com/apache/datafusion/pull/14566#discussion_r1957165631 ## datafusion/physical-plan/src/aggregates/order/partial.rs: ## @@ -103,47 +102,46 @@ enum State { Complete, } +impl State { +fn size(&self) -> usize { +

Re: [PR] WIP: move `DataSource` to `datafusion-datasource` [datafusion]

2025-02-15 Thread via GitHub
logan-keede commented on PR #14671: URL: https://github.com/apache/datafusion/pull/14671#issuecomment-2661024140 > BTW thank you for working on this -- I suggest we try the "make a mock ExecutionPlan" approach -- that would be the cleanest I think and would be a good way for you to learn mo

Re: [I] Implement shuffle using Ray object store [datafusion-ray]

2025-02-15 Thread via GitHub
andygrove commented on issue #55: URL: https://github.com/apache/datafusion-ray/issues/55#issuecomment-2661030408 Based on some earlier experiments, it does not look like using Ray object store will make sense for shuffle -- This is an automated message from the Apache Git Service. To res

Re: [PR] chore Remove hard-coded Comet version from CI [datafusion-comet]

2025-02-15 Thread via GitHub
andygrove closed pull request #1408: chore Remove hard-coded Comet version from CI URL: https://github.com/apache/datafusion-comet/pull/1408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] chore: Disable k8s CI checks [datafusion-ray]

2025-02-15 Thread via GitHub
andygrove commented on code in PR #61: URL: https://github.com/apache/datafusion-ray/pull/61#discussion_r1957180169 ## datafusion_ray/core.py: ## @@ -100,7 +101,7 @@ def __init__( ray_internal_df: RayDataFrameInternal, query_id: str, batch_size=8192, -

Re: [PR] chore: Disable k8s CI checks [datafusion-ray]

2025-02-15 Thread via GitHub
andygrove commented on code in PR #61: URL: https://github.com/apache/datafusion-ray/pull/61#discussion_r1957180223 ## requirements-in.txt: ## @@ -7,6 +7,6 @@ numpy pyarrow>=18.0.0 pytest ray==2.40.0 -datafusion==43.0.0 +datafusion==43.1.0 Review Comment: DataFusion 43.0.

[PR] Set projection before configuring the source [datafusion]

2025-02-15 Thread via GitHub
blaginin opened a new pull request, #14685: URL: https://github.com/apache/datafusion/pull/14685 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/14679 ## Rationale for this change `FileScanConfig::new` now also configures source,

[I] Add Win-amd64 profile [datafusion-comet]

2025-02-15 Thread via GitHub
wForget opened a new issue, #1409: URL: https://github.com/apache/datafusion-comet/issues/1409 ### What is the problem the feature request solves? Failed to load comet library on win-amd64: ``` Failed to load comet library: Unsupported OS/arch, cannot find /org/apache/comet/

[PR] feat: add Win-amd64 profile [datafusion-comet]

2025-02-15 Thread via GitHub
wForget opened a new pull request, #1410: URL: https://github.com/apache/datafusion-comet/pull/1410 ## Which issue does this PR close? Closes #1409. ## Rationale for this change Support comet on Win-amd64 ## What changes are included in this PR? add

Re: [I] Feature: support cast `date` to `timestamp` with tz [datafusion]

2025-02-15 Thread via GitHub
friendlymatthew commented on issue #14638: URL: https://github.com/apache/datafusion/issues/14638#issuecomment-2661249660 > It seems that `Date32`/`Date64` doesn't support TZ, so if we want to support the cast, we should add TZ to `Date32/Date64` firstly? I'm not sure if `Date` types

[PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner [datafusion]

2025-02-15 Thread via GitHub
jayzhan211 opened a new pull request, #14689: URL: https://github.com/apache/datafusion/pull/14689 ## Which issue does this PR close? Part of #14618 ## Rationale for this change We can convert count(*) to count(1) in ExprPlanner. ## What changes ar

Re: [I] multiply overflow in stats.rs [datafusion]

2025-02-15 Thread via GitHub
LindaSummer commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2661253172 > I don't think there was a commit where I saw it - it happened during my local dev of the sqlite testing. Got it! Thanks very much for your warm help! I will try to s

Re: [I] Emit warning with attachedย `Diagnostic`ย when doing `= NULL` [datafusion]

2025-02-15 Thread via GitHub
ugoa commented on issue #14434: URL: https://github.com/apache/datafusion/issues/14434#issuecomment-2660809032 Yeah I am working on it this weekend -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] perf: Drop RowConverter from GroupOrderingPartial [datafusion]

2025-02-15 Thread via GitHub
ctsk commented on code in PR #14566: URL: https://github.com/apache/datafusion/pull/14566#discussion_r1957098299 ## datafusion/physical-plan/src/aggregates/order/partial.rs: ## @@ -103,47 +102,46 @@ enum State { Complete, } +impl State { +fn size(&self) -> usize { +

Re: [I] Physical plan round trip fails in some cases [datafusion]

2025-02-15 Thread via GitHub
milenkovicm commented on issue #14679: URL: https://github.com/apache/datafusion/issues/14679#issuecomment-2660885388 update, commit `5e1e693` #14224 breaks physical plan serde fyi @mertak-synnada -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] multiply overflow in stats.rs [datafusion]

2025-02-15 Thread via GitHub
LindaSummer commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2660885763 Hi @Omega359 , Sorry to interrupt you. ๐Ÿ˜Š I used the commit in https://github.com/Omega359/arrow-datafusion/tree/sqllogictest_with_sqlite in Nov 2024. But it

Re: [I] Integrate `Analyzer` within LogicalPlan building stage [datafusion]

2025-02-15 Thread via GitHub
findepi commented on issue #14618: URL: https://github.com/apache/datafusion/issues/14618#issuecomment-2660888195 Good point @alamb . This would mean that the LogicalPlan, being the public API for _syntactial_ query building, is inherently not "fully resolved". Which suggests creating new I

Re: [PR] Fix pyarrow test [datafusion]

2025-02-15 Thread via GitHub
findepi commented on code in PR #4450: URL: https://github.com/apache/datafusion/pull/4450#discussion_r1957100588 ## .github/workflows/rust.yml: ## @@ -296,7 +296,7 @@ jobs: test-datafusion-pyarrow: name: cargo test pyarrow (amd64) needs: [linux-build-lib] -runs

Re: [PR] Update GitHub CI run image [datafusion]

2025-02-15 Thread via GitHub
findepi commented on PR #14674: URL: https://github.com/apache/datafusion/pull/14674#issuecomment-2660888768 @comphead thanks for the link. Did we miss to follow-up on that temporary workaround? Commented there; will drop pyarrow change. -- This is an automated message from the Apache

Re: [PR] Allow extensions_options to accept Option field [datafusion]

2025-02-15 Thread via GitHub
goldmedal commented on PR #14664: URL: https://github.com/apache/datafusion/pull/14664#issuecomment-2660890526 > I double checked and I am pretty sure this PR is backwards compatable / has no API changes. Specifically, no existing code would need to be changed, right? Yes. no existin

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-15 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1957116717 ## datafusion/common/src/dfschema.rs: ## @@ -1028,21 +1028,32 @@ impl SchemaExt for Schema { }) } -fn logically_equivalent_names_and_ty

Re: [I] Integrate `Analyzer` within LogicalPlan building stage [datafusion]

2025-02-15 Thread via GitHub
jayzhan211 commented on issue #14618: URL: https://github.com/apache/datafusion/issues/14618#issuecomment-2660916504 I agree. In SQL, the transformation follows this path: SQL โ†’ Expr โ†’ LogicalPlan For DataFrame operations, the transformation is: Expr โ†’ LogicalPlan

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-15 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1957116873 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -5274,3 +5274,61 @@ async fn register_non_parquet_file() { "1.json' does not match the expected exten

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-15 Thread via GitHub
zhuqi-lucas commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1957117043 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -5274,3 +5274,61 @@ async fn register_non_parquet_file() { "1.json' does not match the expected exten

Re: [PR] script to export benchmark information as Line Protocol format [datafusion]

2025-02-15 Thread via GitHub
logan-keede commented on code in PR #14662: URL: https://github.com/apache/datafusion/pull/14662#discussion_r1957117220 ## benchmarks/lineformat.py: ## @@ -0,0 +1,130 @@ +#!/usr/bin/env python +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor l

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-15 Thread via GitHub
zhuqi-lucas commented on PR #14572: URL: https://github.com/apache/datafusion/pull/14572#issuecomment-2660925053 > This looks like an improvment to me -- thank you @zhuqi-lucas and @jonahgao Thanks @alamb for review, addressed in latest PR. For union schema potential schema checking i

Re: [PR] script to export benchmark information as Line Protocol format [datafusion]

2025-02-15 Thread via GitHub
logan-keede commented on code in PR #14662: URL: https://github.com/apache/datafusion/pull/14662#discussion_r1957117257 ## benchmarks/lineformat.py: ## @@ -0,0 +1,130 @@ +#!/usr/bin/env python +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor l

Re: [PR] script to export benchmark information as Line Protocol format [datafusion]

2025-02-15 Thread via GitHub
logan-keede commented on code in PR #14662: URL: https://github.com/apache/datafusion/pull/14662#discussion_r1957117257 ## benchmarks/lineformat.py: ## @@ -0,0 +1,130 @@ +#!/usr/bin/env python +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor l

Re: [PR] Consolidate and expand ident normalization tests [datafusion]

2025-02-15 Thread via GitHub
xudong963 merged PR #14374: URL: https://github.com/apache/datafusion/pull/14374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] perf: Drop RowConverter from GroupOrderingPartial [datafusion]

2025-02-15 Thread via GitHub
zhuqi-lucas commented on code in PR #14566: URL: https://github.com/apache/datafusion/pull/14566#discussion_r1957146770 ## datafusion/physical-plan/src/aggregates/order/partial.rs: ## @@ -207,46 +222,151 @@ impl GroupOrderingPartial { let max_group_index = total_num_g

Re: [PR] feat: Add ScalarUDF support in FFI crate [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14579: URL: https://github.com/apache/datafusion/pull/14579#discussion_r1957153878 ## datafusion/ffi/src/signature.rs: ## @@ -0,0 +1,295 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-02-15 Thread via GitHub
alamb commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2660870088 > This follow the first approach in [TUM paper](https://btw-2015.informatik.uni-hamburg.de/res/proceedings/Hauptband/Wiss/Neumann-Unnesting_Arbitrary_Querie.pdf) (simple unnesting

[I] extended_test (with memory limit tracking) are commented out [datafusion]

2025-02-15 Thread via GitHub
alamb opened a new issue, #14680: URL: https://github.com/apache/datafusion/issues/14680 ### Describe the bug - CI tests were failing on main: https://github.com/apache/datafusion/issues/14576 I commented out the extended_test in - https://github.com/apache/datafusion/pull/

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-15 Thread via GitHub
alamb commented on issue #14576: URL: https://github.com/apache/datafusion/issues/14576#issuecomment-2660869774 The tests are passing now so closing this ticket Filed a ticket to track fixing the issues - https://github.com/apache/datafusion/issues/14680 -- This is an automated m

Re: [I] Extended tests are (still) failing on main [datafusion]

2025-02-15 Thread via GitHub
alamb closed issue #14576: Extended tests are (still) failing on main URL: https://github.com/apache/datafusion/issues/14576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] bug: improve schema checking for `insert into` cases [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14572: URL: https://github.com/apache/datafusion/pull/14572#discussion_r1957090671 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -5274,3 +5274,61 @@ async fn register_non_parquet_file() { "1.json' does not match the expected extension '

Re: [I] Feature request: hermetic build [datafusion]

2025-02-15 Thread via GitHub
alamb commented on issue #14678: URL: https://github.com/apache/datafusion/issues/14678#issuecomment-2660867752 I agree figuring out how not to require installing protoc would be really nice DataFusion itself doesn't rely on the environment (only rustc) However some of our depe

Re: [I] Integrate `Analyzer` within LogicalPlan building stage [datafusion]

2025-02-15 Thread via GitHub
alamb commented on issue #14618: URL: https://github.com/apache/datafusion/issues/14618#issuecomment-2660868799 > I propose removing the concept of the Analyzer and integrating it into the SQL โ†’ LogicalPlan stage. Specifically, TypeCoercion should be applied before the plan is finalized (ht

Re: [PR] Allow extensions_options to accept Option field [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14664: URL: https://github.com/apache/datafusion/pull/14664#discussion_r1957090964 ## datafusion/execution/src/task.rs: ## @@ -249,6 +251,39 @@ mod tests { assert!(test.is_some()); assert_eq!(test.unwrap().value, 24); +a

[I] Feature request: documentation on project build instructions [datafusion]

2025-02-15 Thread via GitHub
dentiny opened a new issue, #14681: URL: https://github.com/apache/datafusion/issues/14681 ### Is your feature request related to a problem or challenge? When I started to play around datafusion, I took some time to figure out how to build the whole project and run test cases. The

Re: [PR] Speed up `uuid` UDF (40x faster) [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14675: URL: https://github.com/apache/datafusion/pull/14675#discussion_r1957094423 ## datafusion/sqllogictest/test_files/functions.slt: ## @@ -720,6 +720,14 @@ select count(distinct u) from uuid_table; 2 +# must be valid uuidv4 format +que

Re: [PR] WIP: move `DataSource` to `datafusion-datasource` [datafusion]

2025-02-15 Thread via GitHub
alamb commented on PR #14671: URL: https://github.com/apache/datafusion/pull/14671#issuecomment-2660876614 So exciting... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] perf: Drop RowConverter from GroupOrderingPartial [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14566: URL: https://github.com/apache/datafusion/pull/14566#discussion_r1957095189 ## datafusion/physical-plan/src/aggregates/order/partial.rs: ## @@ -103,47 +102,46 @@ enum State { Complete, } +impl State { +fn size(&self) -> usize { +

Re: [PR] bug: fix offset type mismatch when prepending lists [datafusion]

2025-02-15 Thread via GitHub
alamb merged PR #14672: URL: https://github.com/apache/datafusion/pull/14672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] function: `array_prepend` sometimes doesn't work with nested lists [datafusion]

2025-02-15 Thread via GitHub
alamb closed issue #14613: function: `array_prepend` sometimes doesn't work with nested lists URL: https://github.com/apache/datafusion/issues/14613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-15 Thread via GitHub
simonvandel commented on code in PR #14675: URL: https://github.com/apache/datafusion/pull/14675#discussion_r1957080918 ## datafusion/functions/src/string/uuid.rs: ## @@ -87,7 +88,13 @@ impl ScalarUDFImpl for UuidFunc { if !args.is_empty() { return internal

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-15 Thread via GitHub
alamb commented on PR #14440: URL: https://github.com/apache/datafusion/pull/14440#issuecomment-2660851126 I ran out of time yesterday and I am out next week so I am not likely to be able to make much progress here. Sorry about that -- maybe one of you will have a chance to do someth

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-15 Thread via GitHub
simonvandel commented on code in PR #14675: URL: https://github.com/apache/datafusion/pull/14675#discussion_r1957081225 ## datafusion/functions/src/string/uuid.rs: ## @@ -87,7 +88,13 @@ impl ScalarUDFImpl for UuidFunc { if !args.is_empty() { return internal

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1957081251 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -151,10 +163,15 @@ fn update_coalesce_ctx_children( }; } -/// The boolean flag `repartitio

Re: [PR] Speed up `uuid` UDF (40x faster) [datafusion]

2025-02-15 Thread via GitHub
simonvandel commented on code in PR #14675: URL: https://github.com/apache/datafusion/pull/14675#discussion_r1957081385 ## datafusion/functions/src/string/uuid.rs: ## @@ -87,7 +88,13 @@ impl ScalarUDFImpl for UuidFunc { if !args.is_empty() { return internal

Re: [I] Support predicate pruning for NOT LIKE expressions [datafusion]

2025-02-15 Thread via GitHub
alamb closed issue #14053: Support predicate pruning for NOT LIKE expressions URL: https://github.com/apache/datafusion/issues/14053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-15 Thread via GitHub
alamb commented on PR #14567: URL: https://github.com/apache/datafusion/pull/14567#issuecomment-2660854312 Thanks again @UBarney and @adriangb and @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] feat: pretty explain [datafusion]

2025-02-15 Thread via GitHub
alamb commented on PR #14677: URL: https://github.com/apache/datafusion/pull/14677#issuecomment-2660854503 ๐Ÿ˜ I have been dreaming about this @irenjj -- can't wait to check it out -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Migrate math functions to implement invoke_with_args [datafusion]

2025-02-15 Thread via GitHub
alamb merged PR #14658: URL: https://github.com/apache/datafusion/pull/14658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Migrate math functions to implement invoke_with_args [datafusion]

2025-02-15 Thread via GitHub
alamb commented on PR #14658: URL: https://github.com/apache/datafusion/pull/14658#issuecomment-2660854345 ๐Ÿš€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-15 Thread via GitHub
alamb merged PR #14567: URL: https://github.com/apache/datafusion/pull/14567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] script to export benchmark information as Line Protocol format [datafusion]

2025-02-15 Thread via GitHub
alamb commented on code in PR #14662: URL: https://github.com/apache/datafusion/pull/14662#discussion_r1957084216 ## benchmarks/lineformat.py: ## @@ -0,0 +1,130 @@ +#!/usr/bin/env python +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [I] Integrate `Analyzer` within LogicalPlan building stage [datafusion]

2025-02-15 Thread via GitHub
alamb commented on issue #14618: URL: https://github.com/apache/datafusion/issues/14618#issuecomment-2660943935 > Good point [@alamb](https://github.com/alamb) . This would mean that the LogicalPlan, being the public API for _syntactial_ query building, is inherently not "fully resolved". W

Re: [PR] Improve SQL Planner docs [datafusion]

2025-02-15 Thread via GitHub
jonahgao commented on code in PR #14669: URL: https://github.com/apache/datafusion/pull/14669#discussion_r1957125726 ## datafusion/expr/src/planner.rs: ## @@ -62,39 +75,44 @@ pub trait ContextProvider { not_impl_err!("Recursive CTE is not implemented") } -///

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-15 Thread via GitHub
goldmedal commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1957108526 ## datafusion/expr-common/src/signature.rs: ## @@ -466,6 +551,133 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone, Eq

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-15 Thread via GitHub
xudong963 commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1957131431 ## datafusion/physical-plan/src/tree_node.rs: ## @@ -42,6 +42,8 @@ impl DynTreeNode for dyn ExecutionPlan { /// A node object beneficial for writing optimizer r

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-15 Thread via GitHub
xudong963 commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1957132700 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -45,7 +45,7 @@ use itertools::izip; pub type OrderPreserva

[PR] chore Remove hard-coded Comet version from CI [datafusion-comet]

2025-02-15 Thread via GitHub
andygrove opened a new pull request, #1408: URL: https://github.com/apache/datafusion-comet/pull/1408 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Add missing STRING_AGG functionality [datafusion]

2025-02-15 Thread via GitHub
gabotechs closed pull request #14682: Add missing STRING_AGG functionality URL: https://github.com/apache/datafusion/pull/14682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Add missing STRING_AGG functionality [datafusion]

2025-02-15 Thread via GitHub
gabotechs opened a new pull request, #14682: URL: https://github.com/apache/datafusion/pull/14682 https://github.com/apache/datafusion/pull/14412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

  1   2   >