Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-08-14 Thread via GitHub
phillipleblanc commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2288027658 Looks like this PR didn't get tagged with `api-change`, so it got put under the "Documentation Updates" section in the v41 Changelog: https://github.com/apache/datafusion/blob

Re: [PR] Register get_field by default [datafusion]

2024-08-14 Thread via GitHub
Dandandan merged PR #11959: URL: https://github.com/apache/datafusion/pull/11959 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2288031573 @2010YOUY01 After checking the codes about memory contorl, I think I got it. - `emit_early_if_necessary` is used in `Partial` - and `spill_previous_if_necessary` is used in the

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
2010YOUY01 commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2288221652 > @2010YOUY01 After checking the codes about memory contorl, I think I got it. > > * `emit_early_if_necessary` is used in `Partial` > * and `spill_previous_if_necessary`

Re: [I] Standardize the separator in name [datafusion]

2024-08-14 Thread via GitHub
joroKr21 commented on issue #10364: URL: https://github.com/apache/datafusion/issues/10364#issuecomment-2288257503 Also why do we have all of these which are slightly different? It's really unmanageable: * `Expr::schema_name` * `Expr::canonical_name` * `physical_name` -- This is

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-14 Thread via GitHub
ccciudatu commented on code in PR #11938: URL: https://github.com/apache/datafusion/pull/11938#discussion_r1716648030 ## datafusion/proto/proto/datafusion.proto: ## @@ -1199,3 +1200,28 @@ message PartitionStats { int64 num_bytes = 3; repeated datafusion_common.ColumnStats

[PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-14 Thread via GitHub
joroKr21 opened a new pull request, #11977: URL: https://github.com/apache/datafusion/pull/11977 More consistency and less opportunity for column name mismatch. ## Which issue does this PR close? Closes #. ## Rationale for this change ## What change

[PR] Remove LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue [datafusion]

2024-08-14 Thread via GitHub
notfilippo opened a new pull request, #11978: URL: https://github.com/apache/datafusion/pull/11978 This PR removes LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue, following the discussion on #11513 --- ## Open questions ### Top level ScalarValue cast

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-14 Thread via GitHub
ccciudatu commented on code in PR #11938: URL: https://github.com/apache/datafusion/pull/11938#discussion_r1716651233 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -1081,6 +1091,67 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode { sort_order,

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-08-14 Thread via GitHub
notfilippo commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2288348507 👋 I opened a [draft PR ^](https://github.com/apache/datafusion/pull/11978) to make ScalarValue logical. I have a bunch of open questions that I would be very happy to get fee

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-14 Thread via GitHub
ccciudatu commented on code in PR #11938: URL: https://github.com/apache/datafusion/pull/11938#discussion_r1716654822 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -1081,6 +1091,67 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode { sort_order,

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1716675863 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -31,6 +31,13 @@ pub enum EmitTo { /// For example, if `n=10`, group_index `0, 1, ... 9` are emitt

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1716358223 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/bool_op.rs: ## @@ -68,11 +70,21 @@ where fn update_batch( Review Comment: Yes,

Re: [PR] Refactor `CoalesceBatches` to use an explicit state machine [datafusion]

2024-08-14 Thread via GitHub
ozankabak commented on PR #11966: URL: https://github.com/apache/datafusion/pull/11966#issuecomment-2288441468 I'm going to go ahead and merge this, we will move the GC call above in a follow-on PR if necessary -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Refactor `CoalesceBatches` to use an explicit state machine [datafusion]

2024-08-14 Thread via GitHub
ozankabak merged PR #11966: URL: https://github.com/apache/datafusion/pull/11966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Test CometDriverPlugin in Kube/Yarn [datafusion-comet]

2024-08-14 Thread via GitHub
orthoxerox commented on issue #826: URL: https://github.com/apache/datafusion-comet/issues/826#issuecomment-2288444395 I tested this on YARN and I can confirm that `--conf spark.plugins=org.apache.spark.CometPlugin` increases Spark memory overhead, but this is not visible in the settings:

Re: [PR] feat: `CreateArray` support [datafusion-comet]

2024-08-14 Thread via GitHub
Kimahriman commented on PR #793: URL: https://github.com/apache/datafusion-comet/pull/793#issuecomment-2288452262 > LGTM @Kimahriman Do you mind resolving the conflict? Fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Refactor `CoalesceBatches` to use an explicit state machine [datafusion]

2024-08-14 Thread via GitHub
alamb commented on code in PR #11966: URL: https://github.com/apache/datafusion/pull/11966#discussion_r1716727402 ## datafusion/physical-plan/src/coalesce_batches.rs: ## @@ -364,90 +393,60 @@ impl BatchCoalescer { Arc::clone(&self.schema) } -/// Add a batch,

Re: [PR] Remove LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue [datafusion]

2024-08-14 Thread via GitHub
notfilippo commented on code in PR #11978: URL: https://github.com/apache/datafusion/pull/11978#discussion_r1716741382 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -480,155 +422,17 @@ impl PartitionColumnProjector { } } -#[derive(Debug, Defa

Re: [I] Update `FIND_IN_SET` scalar function to support Utf8View [datafusion]

2024-08-14 Thread via GitHub
alamb closed issue #11954: Update `FIND_IN_SET` scalar function to support Utf8View URL: https://github.com/apache/datafusion/issues/11954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Implement native support StringView for find in set [datafusion]

2024-08-14 Thread via GitHub
alamb merged PR #11970: URL: https://github.com/apache/datafusion/pull/11970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: move coercion of union from builder to `TypeCoercion` [datafusion]

2024-08-14 Thread via GitHub
alamb commented on PR #11961: URL: https://github.com/apache/datafusion/pull/11961#issuecomment-2288480481 Looks great @jonahgao -- thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: move coercion of union from builder to `TypeCoercion` [datafusion]

2024-08-14 Thread via GitHub
alamb merged PR #11961: URL: https://github.com/apache/datafusion/pull/11961 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: move coercion of union from builder to `TypeCoercion` [datafusion]

2024-08-14 Thread via GitHub
alamb commented on code in PR #11961: URL: https://github.com/apache/datafusion/pull/11961#discussion_r1716745998 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -809,7 +809,7 @@ fn coerce_case_expression(case: Case, schema: &DFSchema) -> Result { } /// Get a c

Re: [I] Incorrect result returned by `UNION ALL` (SQLancer-TLP) [datafusion]

2024-08-14 Thread via GitHub
alamb closed issue #11742: Incorrect result returned by `UNION ALL` (SQLancer-TLP) URL: https://github.com/apache/datafusion/issues/11742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] test: re-enable window function over parquet with forced collisions [datafusion]

2024-08-14 Thread via GitHub
alamb commented on code in PR #11939: URL: https://github.com/apache/datafusion/pull/11939#discussion_r1716747384 ## datafusion/sqllogictest/test_files/parquet.slt: ## @@ -251,27 +251,25 @@ SELECT COUNT(*) FROM timestamp_with_tz; 131072 -# FIXME(#TODO) fails with featur

Re: [PR] test: re-enable window function over parquet with forced collisions [datafusion]

2024-08-14 Thread via GitHub
alamb merged PR #11939: URL: https://github.com/apache/datafusion/pull/11939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support partial aggregation skip for boolean functions [datafusion]

2024-08-14 Thread via GitHub
alamb commented on PR #11847: URL: https://github.com/apache/datafusion/pull/11847#issuecomment-2288483753 FWIW https://github.com/apache/datafusion/pull/11825 is merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Window function test fails when `force_hash_collisions` is enabled [datafusion]

2024-08-14 Thread via GitHub
alamb closed issue #11660: Window function test fails when `force_hash_collisions` is enabled URL: https://github.com/apache/datafusion/issues/11660 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Retry to use `Arc` in `PartitionedFile` again [datafusion]

2024-08-14 Thread via GitHub
alamb commented on code in PR #11894: URL: https://github.com/apache/datafusion/pull/11894#discussion_r1710281038 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -391,6 +391,12 @@ impl ParquetExecBuilder { &projected_output_ordering,

Re: [PR] Implement native support StringView for `REPEAT` [datafusion]

2024-08-14 Thread via GitHub
alamb commented on PR #11962: URL: https://github.com/apache/datafusion/pull/11962#issuecomment-2288488049 I took the liberty of merging up from main and running `cargo fmt` on this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Implement native support StringView for `REPEAT` [datafusion]

2024-08-14 Thread via GitHub
tlm365 commented on code in PR #11962: URL: https://github.com/apache/datafusion/pull/11962#discussion_r1716788840 ## datafusion/functions/src/string/repeat.rs: ## @@ -87,18 +95,35 @@ fn repeat(args: &[ArrayRef]) -> Result { let result = string_array .iter()

[I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-14 Thread via GitHub
alamb opened a new issue, #11979: URL: https://github.com/apache/datafusion/issues/11979 # TLDR * Keep `datafusion-cli` in the apache/datafusion repo * Make a new repo with a new CLI called `dfdb` (or `datafusion-cli++`or `dfcli`) which is purposely designed for running queries against

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-14 Thread via GitHub
edmondop commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2288550211 I had a related problems a couple of weeks ago, I wanted to distribute a CLI that had some additional udfs bundled with it. I wonder if we need to set up a plugin system

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-14 Thread via GitHub
alamb commented on PR #11938: URL: https://github.com/apache/datafusion/pull/11938#issuecomment-2288557502 > @alamb The `[datafusion-cli]` commit was just a follow-up to make the new `FLIGHT_SQL` table "storage" type available in `datafusion-cli` (i.e. register the new config namespace). I'

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2288564387 > > @2010YOUY01 After checking the codes about memory contorl, I think I got it. > > > > * `emit_early_if_necessary` is used in `Partial` > > * and `spill_previous_if_nece

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-14 Thread via GitHub
alamb commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2288564561 > I wonder if we need to set up a plugin system for the CLI. I am happy to make a design proposal about it @alamb That would be awesome Another thing that comes to mi

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1716675863 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -31,6 +31,13 @@ pub enum EmitTo { /// For example, if `n=10`, group_index `0, 1, ... 9` are emitt

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1716675863 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -31,6 +31,13 @@ pub enum EmitTo { /// For example, if `n=10`, group_index `0, 1, ... 9` are emitt

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1716675863 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -31,6 +31,13 @@ pub enum EmitTo { /// For example, if `n=10`, group_index `0, 1, ... 9` are emitt

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-14 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1716675863 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -31,6 +31,13 @@ pub enum EmitTo { /// For example, if `n=10`, group_index `0, 1, ... 9` are emitt

[I] The batch_size selection for `CoalesceBatches` doesn't account for cases with a limit [datafusion]

2024-08-14 Thread via GitHub
acking-you opened a new issue, #11980: URL: https://github.com/apache/datafusion/issues/11980 ### Is your feature request related to a problem or challenge? The current [CoalesceBatches](https://docs.rs/datafusion/latest/src/datafusion/physical_optimizer/coalesce_batches.rs.html#38)

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-14 Thread via GitHub
matthewmturner commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2288599071 Love this idea. I had been working on something [similar](https://github.com/datafusion-contrib/datafusion-tui) in the past but unfortunately life got in the way so wasn

[I] order_by not respected for window functions using udaf [datafusion]

2024-08-14 Thread via GitHub
timsaucer opened a new issue, #11981: URL: https://github.com/apache/datafusion/issues/11981 ### Describe the bug When using an aggregate function as a window function, the `order_by` is not respected. I've tried a few permutations of setting this value on the `WindowFunction` and se

[I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
timsaucer opened a new issue, #11982: URL: https://github.com/apache/datafusion/issues/11982 ### Describe the bug When using a window function, you get an additional projection that is not expected. ### To Reproduce ``` let schema = Schema::new(vec![ Fi

[PR] feat: optimize CoalesceBatches in limit [datafusion]

2024-08-14 Thread via GitHub
acking-you opened a new pull request, #11983: URL: https://github.com/apache/datafusion/pull/11983 ## Which issue does this PR close? Closes #11980. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [I] The batch_size selection for `CoalesceBatches` doesn't account for cases with a limit [datafusion]

2024-08-14 Thread via GitHub
acking-you commented on issue #11980: URL: https://github.com/apache/datafusion/issues/11980#issuecomment-2288663420 You can see the code implementation directly; this part is the key change: https://github.com/acking-you/arrow-datafusion/blob/feat/optimize_coalesce_batches/datafusion/core/s

Re: [PR] feat: Add map_extract module and function [datafusion]

2024-08-14 Thread via GitHub
dharanad commented on code in PR #11969: URL: https://github.com/apache/datafusion/pull/11969#discussion_r1716915345 ## datafusion/functions-nested/src/map_extract.rs: ## @@ -0,0 +1,170 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[I] Improve lpad udf by using a GenericStringBuilder [datafusion]

2024-08-14 Thread via GitHub
Omega359 opened a new issue, #11984: URL: https://github.com/apache/datafusion/issues/11984 ### Is your feature request related to a problem or challenge? As noted in the discussion in https://github.com/apache/datafusion/pull/11941#discussion_r1713604384 it looks like the performanc

Re: [I] Improve lpad udf by using a GenericStringBuilder [datafusion]

2024-08-14 Thread via GitHub
Omega359 commented on issue #11984: URL: https://github.com/apache/datafusion/issues/11984#issuecomment-2288753057 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] Fix: support NULL input for regular expression comparison operations [datafusion]

2024-08-14 Thread via GitHub
HuSen8891 opened a new pull request, #11985: URL: https://github.com/apache/datafusion/pull/11985 ## Which issue does this PR close? Related with #11623 ## Rationale for this change Regular expression comparison operations accept NULL input, like select 'abc' ~ nul

Re: [PR] fix: move coercion of union from builder to `TypeCoercion` [datafusion]

2024-08-14 Thread via GitHub
jonahgao commented on PR #11961: URL: https://github.com/apache/datafusion/pull/11961#issuecomment-2288755850 Thanks for the review @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: optimize CoalesceBatches in limit [datafusion]

2024-08-14 Thread via GitHub
2010YOUY01 commented on code in PR #11983: URL: https://github.com/apache/datafusion/pull/11983#discussion_r1716940344 ## datafusion/core/src/physical_optimizer/coalesce_batches.rs: ## @@ -43,45 +50,126 @@ impl CoalesceBatches { Self::default() } } -impl PhysicalO

[I] DataFusion weekly project plan (Andrew Lamb) - Aug 12, 2024 [datafusion]

2024-08-14 Thread via GitHub
alamb opened a new issue, #11986: URL: https://github.com/apache/datafusion/issues/11986 Follow on to https://github.com/apache/datafusion/issues/11826 My (personal) North ⭐ : 1000 projects are built using DataFusion 📈 ( I think we are getting close ) **It would be great for

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 [datafusion]

2024-08-14 Thread via GitHub
alamb closed issue #11826: DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 URL: https://github.com/apache/datafusion/issues/11826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 [datafusion]

2024-08-14 Thread via GitHub
alamb commented on issue #11826: URL: https://github.com/apache/datafusion/issues/11826#issuecomment-2288810083 Next week: https://github.com/apache/datafusion/issues/11986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-14 Thread via GitHub
milenkovicm commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2288812347 Making generic plugin system would be great, but may be a bit of a challenge with lack of stable rust ABI. Making some kind of `CliContextBuilder` where all udfs, tabl

[PR] feat: Add native thread configs [datafusion-comet]

2024-08-14 Thread via GitHub
viirya opened a new pull request, #828: URL: https://github.com/apache/datafusion-comet/pull/828 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

[I] Add native thread configs [datafusion-comet]

2024-08-14 Thread via GitHub
viirya opened a new issue, #829: URL: https://github.com/apache/datafusion-comet/issues/829 ### What is the problem the feature request solves? I saw the following error locally and in CI recently while working on #651 ``` OS can't spawn worker thread: Resource temporarily u

Re: [I] Standardize the separator in name [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 commented on issue #10364: URL: https://github.com/apache/datafusion/issues/10364#issuecomment-2288890948 > Also why do we have all of these which are slightly different? It's really unmanageable: > > * `Expr::schema_name` > * `Expr::canonical_name` > * `physical_name

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
devanbenz commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2288920287 What's the priority of this bug? I would love to take it on but it may take a bit since I'm newer to the df codebase. -- This is an automated message from the Apache Git Se

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-14 Thread via GitHub
andygrove commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2288920605 I think this is a great idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Improve lpad udf by using a GenericStringBuilder [datafusion]

2024-08-14 Thread via GitHub
Omega359 opened a new pull request, #11987: URL: https://github.com/apache/datafusion/pull/11987 ## Which issue does this PR close? Closes #11984 ## Rationale for this change Code cleanup, performance improvement. ## What changes are included in this PR? Co

Re: [PR] feat: Add map_extract module and function [datafusion]

2024-08-14 Thread via GitHub
Weijun-H commented on code in PR #11969: URL: https://github.com/apache/datafusion/pull/11969#discussion_r1717036796 ## datafusion/functions-nested/src/map_extract.rs: ## @@ -0,0 +1,170 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [I] Standardize the separator in name [datafusion]

2024-08-14 Thread via GitHub
joroKr21 commented on issue #10364: URL: https://github.com/apache/datafusion/issues/10364#issuecomment-2288965983 Maybe we can just use the `schema_name` for `physical_name`? It sounds like a bug if they are different: https://github.com/apache/datafusion/pull/11977 -- This is an automat

Re: [I] Disable `create_default_catalog` when the exist session state has default catalog for `SessionStateBuilder` [datafusion]

2024-08-14 Thread via GitHub
goldmedal commented on issue #11988: URL: https://github.com/apache/datafusion/issues/11988#issuecomment-2288975422 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Disable `create_default_catalog` when the exist session state has default catalog for `SessionStateBuilder` [datafusion]

2024-08-14 Thread via GitHub
goldmedal opened a new issue, #11988: URL: https://github.com/apache/datafusion/issues/11988 ### Is your feature request related to a problem or challenge? I found that `SessionStateBuilder::new_from_existing` will unset the default catalog of the existing state. Consider the followin

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
timsaucer commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2288985974 For me this is very annoying, but not blocking any of my work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
timsaucer commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2289002893 If you're looking for a good issue to start on, https://github.com/apache/datafusion/labels/good%20first%20issue -- This is an automated message from the Apache Git Service.

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-14 Thread via GitHub
westonpace commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2289027996 I don't know if others are interested / if this matches the need but it would be cool to make a "CLI-driven query engine frontend" that isn't coupled to any particular query

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
devanbenz commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2289060389 Sounds good! Appears most of the recent `good first issue`'s are currently assigned and I don't really see too many `bug`'s as good first issues. I wonder if: https://github.c

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-14 Thread via GitHub
comphead commented on code in PR #11973: URL: https://github.com/apache/datafusion/pull/11973#discussion_r1717114804 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -83,10 +85,17 @@ impl ScalarUDFImpl for ReverseFunc { /// reverse('abcde') = 'edcba' /// The implementatio

Re: [PR] feat: Add map_extract module and function [datafusion]

2024-08-14 Thread via GitHub
Weijun-H commented on code in PR #11969: URL: https://github.com/apache/datafusion/pull/11969#discussion_r1717115610 ## datafusion/functions-nested/src/map_extract.rs: ## @@ -0,0 +1,170 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-14 Thread via GitHub
comphead commented on code in PR #11973: URL: https://github.com/apache/datafusion/pull/11973#discussion_r1717116647 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -95,59 +104,58 @@ pub fn reverse(args: &[ArrayRef]) -> Result { #[cfg(test)] mod tests { -use arrow

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-14 Thread via GitHub
ccciudatu commented on PR #11938: URL: https://github.com/apache/datafusion/pull/11938#issuecomment-2289078249 @alamb Got it, makes sense. My initial thought was that having this in the main repo would enable a better overall integration (i.e. datafusion-cli support, transparent proto se

[PR] Fix the schema mismatch between logical and physical for aggregate function [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 opened a new pull request, #11989: URL: https://github.com/apache/datafusion/pull/11989 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 commented on code in PR #11989: URL: https://github.com/apache/datafusion/pull/11989#discussion_r1717145616 ## datafusion/core/src/physical_planner.rs: ## @@ -670,6 +670,10 @@ impl DefaultPhysicalPlanner { let input_exec = children.one()?;

Re: [PR] feat: Add native thread configs [datafusion-comet]

2024-08-14 Thread via GitHub
andygrove commented on code in PR #828: URL: https://github.com/apache/datafusion-comet/pull/828#discussion_r1717146193 ## native/core/src/execution/jni_api.rs: ## @@ -133,8 +133,21 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_createPlan( let debug_nat

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 commented on code in PR #11989: URL: https://github.com/apache/datafusion/pull/11989#discussion_r1717145616 ## datafusion/core/src/physical_planner.rs: ## @@ -670,6 +670,10 @@ impl DefaultPhysicalPlanner { let input_exec = children.one()?;

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
devanbenz commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2289120043 That being said since this is a newer bug I'll take it on for now and if I have issues I may unassign. 👍 -- This is an automated message from the Apache Git Service. To res

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
devanbenz commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2289120177 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 commented on code in PR #11989: URL: https://github.com/apache/datafusion/pull/11989#discussion_r1717148071 ## datafusion/core/src/physical_planner.rs: ## @@ -1599,11 +1603,10 @@ pub fn create_aggregate_expr_with_name_and_maybe_filter( let ordering_re

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 commented on code in PR #11989: URL: https://github.com/apache/datafusion/pull/11989#discussion_r1717148941 ## datafusion/expr/src/expr_schema.rs: ## @@ -328,10 +328,45 @@ impl ExprSchemable for Expr { Ok(true) } } +

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-14 Thread via GitHub
Omega359 commented on code in PR #11973: URL: https://github.com/apache/datafusion/pull/11973#discussion_r1717151105 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -83,10 +85,17 @@ impl ScalarUDFImpl for ReverseFunc { /// reverse('abcde') = 'edcba' /// The implementatio

Re: [PR] Update REVERSE scalar function to support Utf8View [datafusion]

2024-08-14 Thread via GitHub
Omega359 commented on code in PR #11973: URL: https://github.com/apache/datafusion/pull/11973#discussion_r1717153211 ## datafusion/functions/src/unicode/reverse.rs: ## @@ -95,59 +104,58 @@ pub fn reverse(args: &[ArrayRef]) -> Result { #[cfg(test)] mod tests { -use arrow

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 commented on code in PR #11989: URL: https://github.com/apache/datafusion/pull/11989#discussion_r1717152737 ## datafusion/physical-plan/src/windows/utils.rs: ## @@ -0,0 +1,35 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [I] Suggest get_expr_planners() return Vec<> rather than &[] [datafusion]

2024-08-14 Thread via GitHub
jayzhan211 commented on issue #11960: URL: https://github.com/apache/datafusion/issues/11960#issuecomment-2289137290 Couldn't we use `to_vec()` to get `Vec>`? > To not share a temporary value in my ContextProvider implementation of get_expr_planners I have to hold onto the Vec in my

Re: [PR] Implement native support StringView for `REPEAT` [datafusion]

2024-08-14 Thread via GitHub
alamb merged PR #11962: URL: https://github.com/apache/datafusion/pull/11962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Update `REPEAT` scalar function to support Utf8View [datafusion]

2024-08-14 Thread via GitHub
alamb closed issue #11914: Update `REPEAT` scalar function to support Utf8View URL: https://github.com/apache/datafusion/issues/11914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Implement native support StringView for `REPEAT` [datafusion]

2024-08-14 Thread via GitHub
alamb commented on code in PR #11962: URL: https://github.com/apache/datafusion/pull/11962#discussion_r1717211609 ## datafusion/functions/src/string/repeat.rs: ## @@ -87,18 +95,35 @@ fn repeat(args: &[ArrayRef]) -> Result { let result = string_array .iter()

[PR] Keep the existing default catalog for `SessionStateBuilder::new_from_existing` [datafusion]

2024-08-14 Thread via GitHub
goldmedal opened a new pull request, #11991: URL: https://github.com/apache/datafusion/pull/11991 ## Which issue does this PR close? Closes #11988 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [I] Improve performance of `REPEAT` functions [datafusion]

2024-08-14 Thread via GitHub
tlm365 commented on issue #11990: URL: https://github.com/apache/datafusion/issues/11990#issuecomment-2289232469 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] Internal error: Empty iterator passed to ScalarValue::iter_to_array [datafusion]

2024-08-14 Thread via GitHub
jmickey opened a new issue, #11992: URL: https://github.com/apache/datafusion/issues/11992 ### Describe the bug We're running InfluxDB Clustered v3 (Build `20240605-1035562`) and we see the following warning in the logs: ``` level=warn msg="Failed to create pruning stats" e

Re: [I] Window functions create unwanted projection [datafusion]

2024-08-14 Thread via GitHub
timsaucer commented on issue #11982: URL: https://github.com/apache/datafusion/issues/11982#issuecomment-2289255869 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Remove LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue [datafusion]

2024-08-14 Thread via GitHub
notfilippo commented on code in PR #11978: URL: https://github.com/apache/datafusion/pull/11978#discussion_r1717253809 ## datafusion/common/src/scalar/mod.rs: ## @@ -1714,6 +1631,15 @@ impl ScalarValue { Some(sv) => sv.data_type(), }; +Self::iter_

Re: [PR] Remove LargeUtf8|Binary, Utf8|BinaryView, and Dictionary from ScalarValue [datafusion]

2024-08-14 Thread via GitHub
notfilippo commented on code in PR #11978: URL: https://github.com/apache/datafusion/pull/11978#discussion_r1717282407 ## datafusion/physical-plan/src/projection.rs: ## @@ -306,9 +306,10 @@ impl ProjectionStream { let arrays = self .expr .iter(

[PR] perf: Add criterion microbenchmark for TPD-DS join [datafusion-comet]

2024-08-14 Thread via GitHub
andygrove opened a new pull request, #830: URL: https://github.com/apache/datafusion-comet/pull/830 ## Which issue does this PR close? N/A ## Rationale for this change Our current TPS-DS microbenchmarks are not really microbencmarks because they run Spark and

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-14 Thread via GitHub
ccciudatu commented on PR #11938: URL: https://github.com/apache/datafusion/pull/11938#issuecomment-2289461637 I found my embarrassing mistake and added a fixup commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[I] `optimize_projections` fails with ambigious columns resulting from a join [datafusion]

2024-08-14 Thread via GitHub
dataders opened a new issue, #11993: URL: https://github.com/apache/datafusion/issues/11993 ### Describe the bug When selecting all (`*`) columns of the result of joining two tables, "not friendly" error message shows up. I believe the message could be better. ``` Failed due

Re: [I] `optimize_projections` fails when eagerly selecting from a join with ambigious column names [datafusion]

2024-08-14 Thread via GitHub
dataders commented on issue #11993: URL: https://github.com/apache/datafusion/issues/11993#issuecomment-2289508882 closing in favor of https://github.com/sdf-labs/sdf-cli/issues/9 as it seems the issue with logical plan resolution within `sdf` -- This is an automated message from the Apac

  1   2   3   >