Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776642321 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single

Re: [PR] Compare schema as logically equivalent to workaround disappearing metadata [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on PR #12631: URL: https://github.com/apache/datafusion/pull/12631#issuecomment-2376350199 I have some suggestion 1. It would be nice if there is a reproducible example so we could find out why and where metadata loss. It could also avoid breaking your case again if w

Re: [I] Implement `Debug` for `SessionStateBuilder [datafusion]

2024-09-26 Thread via GitHub
AnthonyZhOon commented on issue #12555: URL: https://github.com/apache/datafusion/issues/12555#issuecomment-2376331113 Finished being able to derive Debug on SessionStateBuilder #12632 but I believe it still needs to implement formatting to be more useful -- This is an automated message f

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2024-09-26 Thread via GitHub
notfilippo commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2376210779 I would like to organise a call so we can discuss the plan of action of this epic. cc @alamb , @findepi , @jayzhan211 , @ozankabak and anyone else interested in this e

[PR] Implement `Debug` for `SessionStateBuilder`, adding `Debug` requirements to fields [datafusion]

2024-09-26 Thread via GitHub
AnthonyZhOon opened a new pull request, #12632: URL: https://github.com/apache/datafusion/pull/12632 ## Which issue does this PR close? Progress on #12555 ## Rationale for this change To make configuration easier to use by providing debug output. ## What changes are i

Re: [PR] Implement `Debug` for `SessionStateBuilder`, adding `Debug` requirements to fields [datafusion]

2024-09-26 Thread via GitHub
AnthonyZhOon commented on PR #12632: URL: https://github.com/apache/datafusion/pull/12632#issuecomment-2376298842 Needs the `api-change` tag. I'm also fine with splitting this PR into smaller ones -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-26 Thread via GitHub
mesejo commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r1776672713 ## datafusion/expr/src/expr_schema.rs: ## @@ -150,21 +150,22 @@ impl ExprSchemable for Expr { .collect::>>()?; // verify tha

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776729857 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776729857 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-09-26 Thread via GitHub
milenkovicm commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2376226337 It does look good, @Eason0729. maybe @alamb and @comphead can give you better advice than me when they get some time -- This is an automated message from the Apache Gi

Re: [I] Panic when inserting array literal with NULL element into a table where the matching column has a non-nullable element [datafusion]

2024-09-26 Thread via GitHub
jonahgao commented on issue #12598: URL: https://github.com/apache/datafusion/issues/12598#issuecomment-2376225312 > Is the right approach to close this issue after opening a new one against `arrow-rs`? I think we can keep it open and wait for an upstream fix. -- This is an automat

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2376262420 I suggest to propose the plan directly, since it requires thinking to response, maybe not be that efficiently to work in a call synchronously. And it would be more open to ra

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r1776633248 ## datafusion/expr/src/expr_schema.rs: ## @@ -150,21 +150,22 @@ impl ExprSchemable for Expr { .collect::>>()?; // verify

Re: [PR] Derive `Debug` for `SessionStateBuilder`, adding `Debug` requirements to fields [datafusion]

2024-09-26 Thread via GitHub
AnthonyZhOon commented on PR #12632: URL: https://github.com/apache/datafusion/pull/12632#issuecomment-2376394124 Feel like not adding the derive Debug for `SessionStateBuilder` for now, this is the current output of a `SessionStateBuilder::new().with_default_features()` [debug_session

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r1776693550 ## datafusion/expr/src/expr_schema.rs: ## @@ -150,21 +150,22 @@ impl ExprSchemable for Expr { .collect::>>()?; // verify

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-09-26 Thread via GitHub
Eason0729 commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2376409360 Okay, then I will remove thing that doesn't belong to pull request and submit pull request. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776701019 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776701019 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776705698 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1776705698 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single

[I] NestedLoopJoinExec can create excessively large record batches [datafusion]

2024-09-26 Thread via GitHub
mhilton opened a new issue, #12633: URL: https://github.com/apache/datafusion/issues/12633 ### Describe the bug `NestedLoopJoinExec` (really `NestedLoopJoinStream`) produces one output batch for each probe side input batch. However it is possible for each row of build-side input to p

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776517883 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776517883 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r1776517883 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-26 Thread via GitHub
mesejo commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r1776523912 ## datafusion/functions/src/encoding/inner.rs: ## @@ -49,17 +48,8 @@ impl Default for EncodeFunc { impl EncodeFunc { pub fn new() -> Self { -use Dat

Re: [PR] implement nested identifier access [datafusion]

2024-09-26 Thread via GitHub
Lordworms commented on PR #12614: URL: https://github.com/apache/datafusion/pull/12614#issuecomment-2376172097 I didn't realize that there is already a function called get_field which has similar usage to struct_extract 😅 -- This is an automated message from the Apache Git Service. To res

[I] The LIKE and ILIKE behavior for NULL handling in StringView differs from other string types [datafusion]

2024-09-26 Thread via GitHub
goldmedal opened a new issue, #12637: URL: https://github.com/apache/datafusion/issues/12637 ### Describe the bug While working on #12415, I found the `LIKE` and `ILIKE` behavior differs between `StringView` and other string types. Given the following data and SQL: ```sql DataFu

[PR] Upgrade datafusion to 41 [datafusion-ballista]

2024-09-26 Thread via GitHub
palaska opened a new pull request, #1062: URL: https://github.com/apache/datafusion-ballista/pull/1062 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chang

Re: [PR] Upgrade dependencies [datafusion-ballista]

2024-09-26 Thread via GitHub
palaska commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2377374102 > Would it make sense just to upgrade to DF 41 in this PR? I will try > Would it make sense just to upgrade to DF 41 in this PR? Raised a separate PR that

Re: [PR] Add user defined window function support [datafusion-python]

2024-09-26 Thread via GitHub
mesejo commented on code in PR #880: URL: https://github.com/apache/datafusion-python/pull/880#discussion_r1777392689 ## examples/python-udwf.py: ## @@ -0,0 +1,270 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See th

[PR] Limit nested loop join record batch size [datafusion]

2024-09-26 Thread via GitHub
mhilton opened a new pull request, #12634: URL: https://github.com/apache/datafusion/pull/12634 ## Which issue does this PR close? Closes #12633. ## Rationale for this change Some joins use an excessive amount of memory due to creating very large record b

Re: [PR] Derive `Debug` for `SessionStateBuilder`, adding `Debug` requirements to fields [datafusion]

2024-09-26 Thread via GitHub
AnthonyZhOon commented on PR #12632: URL: https://github.com/apache/datafusion/pull/12632#issuecomment-2376741572 Now the impl Debug for `SessionStateBuilder` resembles that of `SessionState` but includes all fields. Also grouped fields by relevance -- This is an automated message from th

Re: [I] Simple Functions [datafusion]

2024-09-26 Thread via GitHub
findepi commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2376910852 FYI: I am doing some experiments how this could look like -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Proposal to donate Ray SQL to the DataFusion Project (not into the Python subproject) [datafusion-python]

2024-09-26 Thread via GitHub
andygrove commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2376891574 As a formality, as part of the ASF IP clearance process, I must remind active committers that they are responsible for ensuring that a Corporate CLA is recorded if such is

Re: [PR] Minor: improve documentation to StringView trim [datafusion]

2024-09-26 Thread via GitHub
comphead commented on code in PR #12629: URL: https://github.com/apache/datafusion/pull/12629#discussion_r1777275215 ## datafusion/functions/src/string/common.rs: ## @@ -35,19 +35,26 @@ use datafusion_expr::ColumnarValue; /// Append a new view to the views buffer with the giv

Re: [PR] parquet: Add support for user-provided metadata loaders [datafusion]

2024-09-26 Thread via GitHub
progval commented on PR #12593: URL: https://github.com/apache/datafusion/pull/12593#issuecomment-2376956688 > I wonder if we could change the "automatically load page index if needed" to "error if page index is needed but it is not loaded" 🤔 That might be a less surprising behavior

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2024-09-26 Thread via GitHub
notfilippo commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2377126283 > We should make it a goal that physical planning also abstracts over physical representation of individual batches. > We should also make it a goal that function are expre

[I] Simple Functions [datafusion]

2024-09-26 Thread via GitHub
findepi opened a new issue, #12635: URL: https://github.com/apache/datafusion/issues/12635 ### Is your feature request related to a problem or challenge? ### Verbosity Currently implementing a scalar function is a pretty involved process. For example a simple function calculati

Re: [I] Simple Functions [datafusion]

2024-09-26 Thread via GitHub
findepi commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2376798706 cc @alamb, @andygrove, @jayzhan211, @ozankabak, @notfilippo, @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] implement kurtosis udaf [datafusion]

2024-09-26 Thread via GitHub
alamb closed pull request #12613: implement kurtosis udaf URL: https://github.com/apache/datafusion/pull/12613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] implement kurtosis udaf [datafusion]

2024-09-26 Thread via GitHub
alamb commented on PR #12613: URL: https://github.com/apache/datafusion/pull/12613#issuecomment-2377026297 Closing as this PR is going to be retargeted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-26 Thread via GitHub
alamb commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r1777112205 ## datafusion/expr/src/expr_schema.rs: ## @@ -150,21 +150,22 @@ impl ExprSchemable for Expr { .collect::>>()?; // verify that

Re: [PR] fix: Use logical row count from RecordBatch [datafusion-comet]

2024-09-26 Thread via GitHub
viirya commented on PR #972: URL: https://github.com/apache/datafusion-comet/pull/972#issuecomment-2377760621 cc @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] fix: Use logical row count from RecordBatch [datafusion-comet]

2024-09-26 Thread via GitHub
viirya opened a new pull request, #972: URL: https://github.com/apache/datafusion-comet/pull/972 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

[I] RecordBatch might have logical row mapping on physical arrays [datafusion-comet]

2024-09-26 Thread via GitHub
viirya opened a new issue, #974: URL: https://github.com/apache/datafusion-comet/issues/974 ### Describe the bug This is related to #973. After applying the fix, the test we run locally with Iceberg table with deleted rows fails on incorrect query result. It is because Iceberg

[I] Java Arrow RecordBatch might have logical row count which is not same as physical row count in the arrays [datafusion-comet]

2024-09-26 Thread via GitHub
viirya opened a new issue, #973: URL: https://github.com/apache/datafusion-comet/issues/973 ### Describe the bug Integrating Comet with Iceberg internally gets the following error if there are deleted rows in the Iceberg table: ``` org.apache.comet.CometNativeException: Inva

Re: [PR] doc: add documentation interlinks [datafusion-comet]

2024-09-26 Thread via GitHub
comphead merged PR #975: URL: https://github.com/apache/datafusion-comet/pull/975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Add user defined window function support [datafusion-python]

2024-09-26 Thread via GitHub
Michael-J-Ward commented on code in PR #880: URL: https://github.com/apache/datafusion-python/pull/880#discussion_r1777685386 ## python/datafusion/udf.py: ## @@ -246,3 +246,229 @@ def udaf( state_type=state_type, volatility=volatility, ) + + +c

Re: [PR] Add user defined window function support [datafusion-python]

2024-09-26 Thread via GitHub
Michael-J-Ward commented on code in PR #880: URL: https://github.com/apache/datafusion-python/pull/880#discussion_r1773701467 ## docs/source/user-guide/common-operations/udf-and-udfa.rst: ## @@ -18,8 +18,21 @@ User Defined Functions == -DataFusion provide

Re: [PR] Add user defined window function support [datafusion-python]

2024-09-26 Thread via GitHub
Michael-J-Ward commented on code in PR #880: URL: https://github.com/apache/datafusion-python/pull/880#discussion_r1777657146 ## python/datafusion/udf.py: ## @@ -246,3 +246,229 @@ def udaf( state_type=state_type, volatility=volatility, ) + + +c

Re: [PR] Limit nested loop join record batch size [datafusion]

2024-09-26 Thread via GitHub
comphead commented on PR #12634: URL: https://github.com/apache/datafusion/pull/12634#issuecomment-2377950999 @mhilton would be that possible to create a unit test reproducing the problem? This will also be important to prevent regression. The repro can be on small batch size up to 5 --

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-26 Thread via GitHub
mesejo commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r127942 ## datafusion/expr/src/expr_schema.rs: ## @@ -150,21 +150,22 @@ impl ExprSchemable for Expr { .collect::>>()?; // verify tha

Re: [I] Simple Functions [datafusion]

2024-09-26 Thread via GitHub
alamb commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2377975313 > Currently DataFusion functions are singletons plugged into the execution engine. They have no way to store and reuse buffers or compiled regular expressions, etc. here i

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-26 Thread via GitHub
alamb commented on PR #12623: URL: https://github.com/apache/datafusion/pull/12623#issuecomment-2377959248 > Marking draft until I have benchmark numbers The benchmark numbers look good -- now I just need to debug one test and I will put this up for review -- This is an automated m

Re: [I] Simple Functions [datafusion]

2024-09-26 Thread via GitHub
alamb commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2377975178 I think the idea of making it easier to write functions that include specialized implementations for different types is a great idea. This would likely both make our code faster (

Re: [I] Update supported Spark and Java versions in installation guide [datafusion-comet]

2024-09-26 Thread via GitHub
justahuman1 commented on issue #742: URL: https://github.com/apache/datafusion-comet/issues/742#issuecomment-2377988737 Hi @adi-kmt are you still working on this? I can take if not. I already have the changes ready, let me know, thanks https://github.com/justahuman1/datafusion-comet

Re: [PR] Add user defined window function support [datafusion-python]

2024-09-26 Thread via GitHub
Michael-J-Ward commented on code in PR #880: URL: https://github.com/apache/datafusion-python/pull/880#discussion_r160825 ## docs/source/user-guide/common-operations/udf-and-udfa.rst: ## @@ -57,30 +126,122 @@ Additionally the :py:func:`~datafusion.udf.AggregateUDF.udaf` fun

Re: [I] Extension Types [datafusion]

2024-09-26 Thread via GitHub
findepi commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2378520646 cc @alamb, @andygrove, @jayzhan211, @ozankabak, @notfilippo, @comphead, @kylebarron, @yukkit, @sunchao, @Folyd, @wjones127, @Xuanwo, @sadboy, @milevin -- This is an automated

Re: [PR] Minor: improve documentation to StringView trim [datafusion]

2024-09-26 Thread via GitHub
alamb commented on code in PR #12629: URL: https://github.com/apache/datafusion/pull/12629#discussion_r100951 ## datafusion/functions/src/string/common.rs: ## @@ -35,19 +35,26 @@ use datafusion_expr::ColumnarValue; /// Append a new view to the views buffer with the given

Re: [PR] Derive `Debug` for `SessionStateBuilder`, adding `Debug` requirements to fields [datafusion]

2024-09-26 Thread via GitHub
alamb commented on code in PR #12632: URL: https://github.com/apache/datafusion/pull/12632#discussion_r1777693960 ## datafusion/core/src/catalog_common/information_schema.rs: ## @@ -70,20 +71,11 @@ impl InformationSchemaProvider { } } -#[derive(Clone)] +#[derive(Clone, D

Re: [I] The LIKE and ILIKE behavior for NULL handling in StringView differs from other string types [datafusion]

2024-09-26 Thread via GitHub
alamb commented on issue #12637: URL: https://github.com/apache/datafusion/issues/12637#issuecomment-2377893100 > I'm not really sure if the behavior of StringView is expected 🤔 but I think their behavior should be consistent. > When the input value is NULL, string type will return NU

Re: [I] Proposal: introduced typed expressions, separate AST and IR [datafusion]

2024-09-26 Thread via GitHub
alamb commented on issue #12604: URL: https://github.com/apache/datafusion/issues/12604#issuecomment-2377859743 > let Expr carry the type explicitly (that would be "logical type" I think this is the > correct, unless get_type is called more than once. optimizer rules ma

Re: [PR] implement nested identifier access [datafusion]

2024-09-26 Thread via GitHub
alamb commented on code in PR #12614: URL: https://github.com/apache/datafusion/pull/12614#discussion_r112094 ## datafusion/sql/src/expr/identifier.rs: ## @@ -142,9 +135,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { }

Re: [I] The LIKE and ILIKE behavior for NULL handling in StringView differs from other string types [datafusion]

2024-09-26 Thread via GitHub
goldmedal commented on issue #12637: URL: https://github.com/apache/datafusion/issues/12637#issuecomment-2378241398 > I think if the input is null, the string view should also return null Thanks! I updated the expected behavior 👍 -- This is an automated message from the Apache Git

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-26 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1777916707 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [PR] build(deps): bump prost from 0.13.2 to 0.13.3 [datafusion-python]

2024-09-26 Thread via GitHub
Michael-J-Ward merged PR #882: URL: https://github.com/apache/datafusion-python/pull/882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] build(deps): bump prost-types from 0.13.2 to 0.13.3 [datafusion-python]

2024-09-26 Thread via GitHub
Michael-J-Ward merged PR #881: URL: https://github.com/apache/datafusion-python/pull/881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Refactor PrimitiveGroupValueBuilder to use BooleanBuilder [datafusion]

2024-09-26 Thread via GitHub
alamb commented on code in PR #12623: URL: https://github.com/apache/datafusion/pull/12623#discussion_r145457 ## datafusion/physical-plan/src/aggregates/group_values/group_value_row.rs: ## @@ -121,37 +145,64 @@ impl ArrayRowEq for PrimitiveGroupValueBuilder { }

Re: [I] DataFrame parse_sql_expr does not handle aliases [datafusion]

2024-09-26 Thread via GitHub
alamb commented on issue #12518: URL: https://github.com/apache/datafusion/issues/12518#issuecomment-2377960672 PR in sqlparser: https://github.com/sqlparser-rs/sqlparser-rs/pull/1444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-26 Thread via GitHub
andygrove commented on code in PR #12626: URL: https://github.com/apache/datafusion/pull/12626#discussion_r1777822954 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -704,6 +730,7 @@ async fn roundtrip_logical_plan_distinct_on() -> Result<()> { let plan =

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-26 Thread via GitHub
palaska commented on code in PR #12626: URL: https://github.com/apache/datafusion/pull/12626#discussion_r1777824338 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -704,6 +730,7 @@ async fn roundtrip_logical_plan_distinct_on() -> Result<()> { let plan = ct

[PR] [MINOR]: Simplifications Sort Operator [datafusion]

2024-09-26 Thread via GitHub
akurmustafa opened a new pull request, #12639: URL: https://github.com/apache/datafusion/pull/12639 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? Minor changes in the Sort code to impr

Re: [PR] Minor: add partial assertion for skip aggregation probe [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12640: URL: https://github.com/apache/datafusion/pull/12640#discussion_r1778036865 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1004,9 +1004,13 @@ impl GroupedHashAggregateStream { /// Updates skip aggregation probe stat

[PR] doc: add documentation interlinks [datafusion-comet]

2024-09-26 Thread via GitHub
comphead opened a new pull request, #975: URL: https://github.com/apache/datafusion-comet/pull/975 ## Which issue does this PR close? Closes #. ## Rationale for this change Adding more documentation interlinks and Comet docs link from the main page

Re: [PR] Replace `OnceLock` with `LazyLock`, update MSRV to 1.80 [datafusion]

2024-09-26 Thread via GitHub
alamb commented on PR #12601: URL: https://github.com/apache/datafusion/pull/12601#issuecomment-2377888909 Sorry for misleading you @OussamaSaoudi -- it will be great when 1.80 is released -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Simple Functions [datafusion]

2024-09-26 Thread via GitHub
comphead commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2378116102 Thanks @findepi I think this process go through iterations, and easier than was before but still far from perfect. The ScalarUDFImpl common trait is already a huge help,

Re: [PR] Upgrade to Datafusion 42 [datafusion-ballista]

2024-09-26 Thread via GitHub
andygrove commented on PR #1059: URL: https://github.com/apache/datafusion-ballista/pull/1059#issuecomment-2378345468 I'm fine with pinning to a revision of DataFusion once your PR is merged over there. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Add binary to string_view coercion [datafusion]

2024-09-26 Thread via GitHub
doupache commented on code in PR #12643: URL: https://github.com/apache/datafusion/pull/12643#discussion_r1777970052 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -1052,12 +1052,16 @@ fn binary_to_string_coercion( match (lhs_type, rhs_type) { (Binary

Re: [PR] Add binary to string_view coercion [datafusion]

2024-09-26 Thread via GitHub
doupache commented on PR #12643: URL: https://github.com/apache/datafusion/pull/12643#issuecomment-2378348112 Without these change , hit_partitioned will failed at query 20 ``` Q20: SELECT COUNT(*) FROM hits WHERE "URL" LIKE '%google%'; Error: Context("type_coercion", Plan("Ther

[PR] Propagating the error generated by the input stream and continue polling [datafusion]

2024-09-26 Thread via GitHub
YjyJeff opened a new pull request, #12642: URL: https://github.com/apache/datafusion/pull/12642 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/12641. ## Rationale for this change ## What changes are included in this P

Re: [I] target_partitions execution option is ignored when the input has 1 partition [datafusion]

2024-09-26 Thread via GitHub
akurmustafa commented on issue #12611: URL: https://github.com/apache/datafusion/issues/12611#issuecomment-2378127718 In datafusion, target_partition argument doesn't necessarily increase partition count each time. If DataFusion thinks that executing the query in single partition is better

Re: [I] `CREATE EXTERNAL TABLE` does not support schema (fully qualified names) [datafusion]

2024-09-26 Thread via GitHub
OussamaSaoudi commented on issue #12607: URL: https://github.com/apache/datafusion/issues/12607#issuecomment-2378423926 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-26 Thread via GitHub
andygrove merged PR #12626: URL: https://github.com/apache/datafusion/pull/12626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Minor: add partial assertion for skip aggregation probe [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12640: URL: https://github.com/apache/datafusion/pull/12640#discussion_r1778034315 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1004,9 +1004,13 @@ impl GroupedHashAggregateStream { /// Updates skip aggregation probe stat

Re: [PR] Minor: add partial assertion for skip aggregation probe [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12640: URL: https://github.com/apache/datafusion/pull/12640#discussion_r1778034315 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1004,9 +1004,13 @@ impl GroupedHashAggregateStream { /// Updates skip aggregation probe stat

Re: [PR] Minor: add partial assertion for skip aggregation probe [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12640: URL: https://github.com/apache/datafusion/pull/12640#discussion_r1778036865 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1004,9 +1004,13 @@ impl GroupedHashAggregateStream { /// Updates skip aggregation probe stat

Re: [PR] Minor: add partial assertion for skip aggregation probe [datafusion]

2024-09-26 Thread via GitHub
jayzhan211 commented on code in PR #12640: URL: https://github.com/apache/datafusion/pull/12640#discussion_r1778036865 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1004,9 +1004,13 @@ impl GroupedHashAggregateStream { /// Updates skip aggregation probe stat

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-26 Thread via GitHub
alamb commented on PR #12620: URL: https://github.com/apache/datafusion/pull/12620#issuecomment-2377963503 Thanks @comphead and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-26 Thread via GitHub
alamb merged PR #12620: URL: https://github.com/apache/datafusion/pull/12620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Implement fast min/max accumulator for binary / strings (now it uses the slower path) [datafusion]

2024-09-26 Thread via GitHub
devanbenz commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2378010992 Finally got around to trying out a little POC locally. I'm getting much better results by just storing the state as `Vec` it's very much not polished at this moment in time but

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-26 Thread via GitHub
alamb commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r147273 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supporte

[PR] Minor: add partial assertion for skip aggregation probe [datafusion]

2024-09-26 Thread via GitHub
Rachelint opened a new pull request, #12640: URL: https://github.com/apache/datafusion/pull/12640 ## Which issue does this PR close? Closes #. ## Rationale for this change I found an partial assertion check may be needed in skip aggregation probe, this pr added it.

Re: [PR] Replace `OnceLock` with `LazyLock`, update MSRV to 1.80 [datafusion]

2024-09-26 Thread via GitHub
OussamaSaoudi commented on PR #12601: URL: https://github.com/apache/datafusion/pull/12601#issuecomment-2378282642 np @alamb :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Simplify `update_skip_aggregation_probe` method [datafusion]

2024-09-26 Thread via GitHub
Rachelint commented on code in PR #12332: URL: https://github.com/apache/datafusion/pull/12332#discussion_r1777926331 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { /// Note: currently spilling is not supp

Re: [I] Implement `Debug` for `SessionStateBuilder [datafusion]

2024-09-26 Thread via GitHub
alamb commented on issue #12555: URL: https://github.com/apache/datafusion/issues/12555#issuecomment-2377868821 > Finished being able to derive Debug on SessionStateBuilder https://github.com/apache/datafusion/pull/12632 but I believe it still needs to implement formatting to be more useful

Re: [PR] Minor: Encapsulate type check in GroupValuesColumn, avoid panic [datafusion]

2024-09-26 Thread via GitHub
alamb commented on PR #12620: URL: https://github.com/apache/datafusion/pull/12620#issuecomment-2377883917 Thank you for the review @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Fix sort node deserialization from proto [datafusion]

2024-09-26 Thread via GitHub
alamb commented on PR #12626: URL: https://github.com/apache/datafusion/pull/12626#issuecomment-2377910598 I took the liberty of fixing the `clippy` error and pushing the fix to this branch to get CI to pass (example of [failing CI](https://github.com/apache/datafusion/actions/runs/11039247

Re: [PR] implement nested identifier access [datafusion]

2024-09-26 Thread via GitHub
Lordworms commented on code in PR #12614: URL: https://github.com/apache/datafusion/pull/12614#discussion_r168711 ## datafusion/sql/src/expr/identifier.rs: ## @@ -142,9 +135,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { }

[PR] Xpass tests [datafusion-python]

2024-09-26 Thread via GitHub
Michael-J-Ward opened a new pull request, #884: URL: https://github.com/apache/datafusion-python/pull/884 # Rationale for this change These tests are now passing, so I've removed the `xfail` from them. # Are there any user-facing changes? No. -- This is an automated message f

Re: [PR] Minor: add partial assertion for skip aggregation probe [datafusion]

2024-09-26 Thread via GitHub
Rachelint commented on code in PR #12640: URL: https://github.com/apache/datafusion/pull/12640#discussion_r1777945500 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -1004,9 +1004,13 @@ impl GroupedHashAggregateStream { /// Updates skip aggregation probe state

  1   2   >