Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-21 Thread via GitHub
wiedld commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1966451622 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -72,16 +70,18 @@ pub fn assign_initial_requirements(node: &mut SortPushDown) { } p

Re: [PR] Parse casting to array using double colon operator in Redshift [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio merged PR #1737: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966457481 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] feat: adjust create and drop trigger for mysql dialect [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
invm commented on code in PR #1734: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1734#discussion_r1966463262 ## src/parser/mod.rs: ## @@ -5061,20 +5066,19 @@ impl<'a> Parser<'a> { } pub fn parse_trigger_period(&mut self) -> Result { -Ok( -

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966457673 ## datafusion/expr-common/src/statistics.rs: ## @@ -0,0 +1,1610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966455676 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -645,34 +651,78 @@ impl Interval { let upper = min_of_bounds(&self.upper, &rhs.upper);

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966455316 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -144,6 +153,111 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966454968 ## datafusion/expr-common/src/statistics.rs: ## @@ -0,0 +1,1610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Replace parallel condition/result vectors with single CaseWhen vector in Expr::Case [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio merged PR #1733: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Ignore escaped LIKE wildcards in MySQL [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio commented on code in PR #1735: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1966454048 ## tests/sqlparser_mysql.rs: ## @@ -2530,6 +2530,16 @@ fn parse_rlike_and_regexp() { } } +#[test] +fn parse_like_with_escape() { +mysql().ve

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
ozankabak commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1966453296 ## datafusion/expr-common/src/statistics.rs: ## @@ -0,0 +1,1610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-21 Thread via GitHub
ozankabak commented on PR #14689: URL: https://github.com/apache/datafusion/pull/14689#issuecomment-2676043465 CI broke after this PR. We have this extended-tests-failing-after-merge happening very frequently recently. Maybe the decision to defer extended tests to post-merge was a wro

Re: [PR] feat: adjust create and drop trigger for mysql dialect [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio commented on code in PR #1734: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1734#discussion_r1966451757 ## src/parser/mod.rs: ## @@ -5061,20 +5066,19 @@ impl<'a> Parser<'a> { } pub fn parse_trigger_period(&mut self) -> Result { -Ok( -

Re: [PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-21 Thread via GitHub
wiedld commented on code in PR #14821: URL: https://github.com/apache/datafusion/pull/14821#discussion_r1966451622 ## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ## @@ -72,16 +70,18 @@ pub fn assign_initial_requirements(node: &mut SortPushDown) { } p

[PR] Refactor SortPushdown using the standard top-down visitor. [datafusion]

2025-02-21 Thread via GitHub
wiedld opened a new pull request, #14821: URL: https://github.com/apache/datafusion/pull/14821 ## Which issue does this PR close? No issue. ## Rationale for this change It's a minor refactor on the `EnforceSorting` subrule `sort_pushdown`. I was having a hard time reaso

Re: [PR] Add support for `ORDER BY ALL` [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio merged PR #1724: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support column prefix index for MySQL [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio commented on code in PR #1732: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1732#discussion_r1966449581 ## src/ast/mod.rs: ## @@ -8591,6 +8591,61 @@ pub enum CopyIntoSnowflakeKind { Location, } +/// Index Field +/// +/// This structure used here [`

Re: [PR] Extend Visitor trait for Value type [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio merged PR #1725: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-21 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1966344294 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -244,13 +244,14 @@ SELECT x, y FROM t1 UNION BY NAME (SELECT y, z FROM t2 INTERSECT SELECT 2, 2 as

Re: [I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]

2025-02-21 Thread via GitHub
Arpit-Bandejiya commented on issue #14816: URL: https://github.com/apache/datafusion/issues/14816#issuecomment-2676024092 @alamb @andygrove please provide your opinion on this usecase! -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] fix: we are missing the unlimited case for bounded streaming when usi… [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas commented on code in PR #14815: URL: https://github.com/apache/datafusion/pull/14815#discussion_r1966423649 ## datafusion-cli/src/exec.rs: ## @@ -269,6 +269,11 @@ pub(super) async fn exec_and_print( reservation.try_grow(get_record_batch_memo

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
clflushopt commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2675959187 I can review and make the necessary changes to #14735 once it gets merged if that is necessary @alamb. I skimmed this PR earlier and just made a small pass through; this is

Re: [PR] Feature scalar regexp match benchmark [datafusion]

2025-02-21 Thread via GitHub
github-actions[bot] commented on PR #13789: URL: https://github.com/apache/datafusion/pull/13789#issuecomment-2675934708 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix: type checking [datafusion-python]

2025-02-21 Thread via GitHub
chenkovsky commented on code in PR #993: URL: https://github.com/apache/datafusion-python/pull/993#discussion_r1966359261 ## python/datafusion/context.py: ## @@ -783,7 +783,9 @@ def register_parquet( file_extension, skip_metadata, schema, -

Re: [PR] fix: we are missing the unlimited case for bounded streaming when usi… [datafusion]

2025-02-21 Thread via GitHub
comphead commented on code in PR #14815: URL: https://github.com/apache/datafusion/pull/14815#discussion_r1966358685 ## datafusion-cli/src/exec.rs: ## @@ -269,6 +269,11 @@ pub(super) async fn exec_and_print( reservation.try_grow(get_record_batch_memory_

Re: [PR] fix: type checking [datafusion-python]

2025-02-21 Thread via GitHub
chenkovsky commented on code in PR #993: URL: https://github.com/apache/datafusion-python/pull/993#discussion_r1966356247 ## python/datafusion/udf.py: ## @@ -182,7 +182,7 @@ class AggregateUDF: def __init__( self, -name: Optional[str], +name: str,

Re: [PR] fix: type checking [datafusion-python]

2025-02-21 Thread via GitHub
chenkovsky commented on code in PR #993: URL: https://github.com/apache/datafusion-python/pull/993#discussion_r1966358112 ## python/datafusion/udf.py: ## @@ -158,7 +158,7 @@ def state(self) -> List[pyarrow.Scalar]: pass @abstractmethod -def update(self, *valu

Re: [PR] fix: type checking [datafusion-python]

2025-02-21 Thread via GitHub
chenkovsky commented on code in PR #993: URL: https://github.com/apache/datafusion-python/pull/993#discussion_r1966356247 ## python/datafusion/udf.py: ## @@ -182,7 +182,7 @@ class AggregateUDF: def __init__( self, -name: Optional[str], +name: str,

Re: [I] Columnar shuffle uses wrong memory allocator in unified memory mode [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #1438: URL: https://github.com/apache/datafusion-comet/issues/1438#issuecomment-2675884654 This is a regression introduced in https://github.com/apache/datafusion-comet/commit/e72beb1faaba39d45e05e537d9b84151db7e73ff#diff-dcc1523ab10f465921fcdea05347484ed7bb944a

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-21 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1966344294 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -244,13 +244,14 @@ SELECT x, y FROM t1 UNION BY NAME (SELECT y, z FROM t2 INTERSECT SELECT 2, 2 as

Re: [I] Datafusion can't seem to cast evolving structs [datafusion]

2025-02-21 Thread via GitHub
TheBuilderJR commented on issue #14757: URL: https://github.com/apache/datafusion/issues/14757#issuecomment-2675883560 @alamb how do y'all handle this at influx? This one comes as quite a shocker to me. Does no one else using datafusion support struct evolution? -- This is an automated m

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1966326123 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 commented on PR #14689: URL: https://github.com/apache/datafusion/pull/14689#issuecomment-2675830556 Thanks ALL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 merged PR #14689: URL: https://github.com/apache/datafusion/pull/14689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1966326123 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [I] Reduce spilling overhead in Comet shuffle [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #1436: URL: https://github.com/apache/datafusion-comet/issues/1436#issuecomment-2675809473 One issue is that Comet native shuffle creates a new spill file for each call to spill, often creating > 10,000 files. Each file contains a partial batch per output parti

Re: [I] Create more user friendly aliases from `col` [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on issue #754: URL: https://github.com/apache/datafusion-python/issues/754#issuecomment-2675805174 I think @Spaarsh is on the right track for this one, that it more likely needs to be resolved in the upstream repo. Or we would have to do some form of work around.

[PR] Parse signed/unsigned integer data types correctly in MySQL CAST [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
mvzink opened a new pull request, #1739: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1739 MySQL doesn't have the same set of possible CAST types as for e.g. column definitions. For example, it raises a syntax error for `CAST(1 AS INTEGER SIGNED)` and instead expects `CAST(1

[I] Columnar shuffle uses wrong memory allocator in unified memory mode [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove opened a new issue, #1438: URL: https://github.com/apache/datafusion-comet/issues/1438 ### Describe the bug I am using unified memory management: ``` --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=2g \ ``` I delibera

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
comphead commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1965733520 ## datafusion/expr-common/src/statistics.rs: ## @@ -0,0 +1,1610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer merged PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#issuecomment-2675738118 Thanks for the reviews @comphead @parthchandra @kazuyukitanimura @mbutrovich @hayman42 I ran benchmarks with 1 executor w/ 8 cores vs 2 executors w/ 4 cores and saw no

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove merged PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2675735322 > It is weird that most of the metrics including spill size and execution time get 7-8x higher. I don't know why it happens but I am trying to figure out. ``` sh

Re: [PR] fix: type checking [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on code in PR #993: URL: https://github.com/apache/datafusion-python/pull/993#discussion_r1966217074 ## python/datafusion/context.py: ## @@ -783,7 +783,9 @@ def register_parquet( file_extension, skip_metadata, schema, -

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #708: URL: https://github.com/apache/datafusion-comet/issues/708#issuecomment-2675431383 I am closing this issue for now because I believe that we determined that this is no longer a priority. We can reopen the issue if this changes. -- This is an automated m

[I] Shuffle spilled_bytes metric is incorrect [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove opened a new issue, #1437: URL: https://github.com/apache/datafusion-comet/issues/1437 ### Describe the bug In `ShuffleWriterExec`, we are writing incorrect data for `spilled_bytes`. We are adding the size of the current memory reservation rather than the number of bytes wr

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1965962598 ## datafusion/functions-nested/src/sort.rs: ## @@ -143,6 +169,13 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { return exec_err!("array_sor

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966132652 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlan

Re: [I] Upgrade to Rust 1.85 [datafusion]

2025-02-21 Thread via GitHub
comphead closed issue #14808: Upgrade to Rust 1.85 URL: https://github.com/apache/datafusion/issues/14808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] Cancellation benchmark [datafusion]

2025-02-21 Thread via GitHub
carols10cents commented on PR #14818: URL: https://github.com/apache/datafusion/pull/14818#issuecomment-2675566618 Let's see if this works too /benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[PR] Improve benchmark docs [datafusion]

2025-02-21 Thread via GitHub
carols10cents opened a new pull request, #14820: URL: https://github.com/apache/datafusion/pull/14820 ## Which issue does this PR close? - Closes #14819. ## Rationale for this change I added a new benchmark in #14818. There wasn't documentation on how to add a new benchm

[PR] Cancellation benchmark [datafusion]

2025-02-21 Thread via GitHub
carols10cents opened a new pull request, #14818: URL: https://github.com/apache/datafusion/pull/14818 ## Which issue does this PR close? Connects to #14036 (does not close it). ## Rationale for this change The behavior observed in #14036 was hard to reproduce and quantify

Re: [I] Add documentation for why FFI is needed [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer closed issue #1027: Add documentation for why FFI is needed URL: https://github.com/apache/datafusion-python/issues/1027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966132652 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlan

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1966140038 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] test: Register Spark-compatible expressions with a DataFusion context [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on PR #1432: URL: https://github.com/apache/datafusion-comet/pull/1432#issuecomment-2675473580 Thanks @viczsaurav looks like the format checks are failing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966133439 ## spark/src/main/scala/org/apache/comet/GenerateDocs.scala: ## @@ -69,7 +69,8 @@ object GenerateDocs { w.write("|-|-|-|\n".getBytes) for

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1966124188 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -31,14 +32,29 @@ import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec,

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1966123856 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -31,14 +32,29 @@ import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec,

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1966123689 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -48,7 +64,7 @@ object RewriteJoin extends JoinSelectionHelper { def rewrite(plan

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966118342 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSpa

[PR] chore: fix clippy after rust 1.85 update [datafusion-ballista]

2025-02-21 Thread via GitHub
milenkovicm opened a new pull request, #1188: URL: https://github.com/apache/datafusion-ballista/pull/1188 # Which issue does this PR close? Closes #. # Rationale for this change Fix clippy after rust 1.85 update # What changes are included in this PR? # A

Re: [I] date_part is calculating results incorrectly for intervals [datafusion]

2025-02-21 Thread via GitHub
Omega359 commented on issue #14817: URL: https://github.com/apache/datafusion/issues/14817#issuecomment-2675449565 It seems that duckdb is also following the interval rules like pg for date_part - https://duckdb.org/docs/sql/data_types/interval.html ```sql D SELECT datepart('second

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
kevinjqliu commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966110882 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor lice

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966105015 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor licen

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion]

2025-02-21 Thread via GitHub
Omega359 commented on issue #14771: URL: https://github.com/apache/datafusion/issues/14771#issuecomment-2675418811 I would posit that the behaviour should in general mirror postgresql unless there is a good reason to not to. -- This is an automated message from the Apache Git Service. To

[I] date_part is calculating results incorrectly for intervals [datafusion]

2025-02-21 Thread via GitHub
Omega359 opened a new issue, #14817: URL: https://github.com/apache/datafusion/issues/14817 ### Describe the bug Splitting out from https://github.com/apache/datafusion/issues/14738#issuecomment-2666570269: ```sql SELECT date_part('seconds', interval '1 hour'); -- re

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove closed issue #708: Implement native version of ColumnarToRow URL: https://github.com/apache/datafusion-comet/issues/708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Comet native shuffle reader [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #1125: URL: https://github.com/apache/datafusion-comet/issues/1125#issuecomment-2675433947 I am closing this issue for now because I no longer belive it to be a priority. We can reopen if needed. -- This is an automated message from the Apache Git Service. To

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-21 Thread via GitHub
berkaysynnada commented on code in PR #14813: URL: https://github.com/apache/datafusion/pull/14813#discussion_r1965740381 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -222,208 +227,6 @@ async fn test_remove_unnecessary_sort5() -> Result<()> { Ok(())

Re: [I] Comet native shuffle reader [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove closed issue #1125: Comet native shuffle reader URL: https://github.com/apache/datafusion-comet/issues/1125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Add some DataFrame method(s) to combine two inputs where the schema can be different [datafusion]

2025-02-21 Thread via GitHub
Omega359 commented on issue #12650: URL: https://github.com/apache/datafusion/issues/12650#issuecomment-2675405695 This should be much easier to implement now that https://github.com/apache/datafusion/issues/14508 has landed -- This is an automated message from the Apache Git Service. To

Re: [PR] Feat/ffi scalar udf [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on PR #1033: URL: https://github.com/apache/datafusion-python/pull/1033#issuecomment-2675401682 Putting into draft because we need the upstream DataFusion repository to release version 46 before this can be enabled. -- This is an automated message from the Apache Git

[PR] Feat/ffi scalar udf [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer opened a new pull request, #1033: URL: https://github.com/apache/datafusion-python/pull/1033 # Which issue does this PR close? This addresses part of #1017 - the scalar UDFs # Rationale for this change This change enables users who have written DataFusion scala

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-21 Thread via GitHub
comphead commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1965693790 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1966030422 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
kevinjqliu commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966029158 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor lice

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
kevinjqliu commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966017216 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor lice

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2675237730 For the high level abstractions, I believe these are already met. The DataFrame API is available and widely used (in fact, its the only way I personally use it). The [co

Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#issuecomment-2675282815 Per the vote we have 3 PMCs with +1: https://lists.apache.org/thread/1nvpzpdkxjz17kmlg4wlty7pt5y6jvh4 I am moving this to ready, but I will need a PMC to do the final s

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1965987283 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -48,7 +64,7 @@ object RewriteJoin extends JoinSelectionHelper { def rewri

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-02-21 Thread via GitHub
duongcongtoai commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2675276468 From what is see in current code, this struct `PullUpCorrelatedExpr` is applied for scalar subquery as well as predicate subquery. For that paper, i'll try my best,

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-21 Thread via GitHub
sidshehria commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2675258479 @timsaucer Thanks for the clarity! I understand the explanation on the DataFrame API, lazy mode of evaluation, and Pandas/Polars integration better. I will re

Re: [PR] chore: Update guava to 33.2.1-jre [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on PR #1435: URL: https://github.com/apache/datafusion-comet/pull/1435#issuecomment-2675203415 Merged thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1965956601 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] chore: Update guava to 33.2.1-jre [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura merged PR #1435: URL: https://github.com/apache/datafusion-comet/pull/1435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] chore: Update protobuf to 3.25.5 [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on PR #1434: URL: https://github.com/apache/datafusion-comet/pull/1434#issuecomment-2675202150 Merged thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] chore: Update protobuf to 3.25.5 [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura merged PR #1434: URL: https://github.com/apache/datafusion-comet/pull/1434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [I] Support `UNNEST` as table function (UDTF) [datafusion]

2025-02-21 Thread via GitHub
waynexia commented on issue #14801: URL: https://github.com/apache/datafusion/issues/14801#issuecomment-2675201056 Thank you @jonahgao 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [WIP] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
lovasoa commented on PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#issuecomment-2675136484 feel free to edit the code directly without asking me first :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Implemented `simplify` for the `starts_with` function to convert it into a LIKE expression. [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 commented on PR #14119: URL: https://github.com/apache/datafusion/pull/14119#issuecomment-2674769799 I wonder if converting `starts_with` to `like` add overhead. https://github.com/apache/arrow-rs/blob/a0c3186c55ac8ed3f6b8a15d1305548fd6305ebb/arrow-string/src/predicate.rs#L

[PR] [WIP] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
lovasoa opened a new pull request, #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]

2025-02-21 Thread via GitHub
Arpit-Bandejiya opened a new issue, #14816: URL: https://github.com/apache/datafusion/issues/14816 BackgroundWe have an usecase where data is stored in multiple engines/formats and Parquet is the primary format containing all the data. While text queries are handled by inverted index format

Re: [I] Further improve datafusion-cli memory usage if we setting huge number for maxrow size. [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas commented on issue #14810: URL: https://github.com/apache/datafusion/issues/14810#issuecomment-2674966485 Thank you @alamb for the great idea. Besides this improvement, i also found a bug for unlimited cases which we are missing for the buffer. Filed a ticket now: h

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-21 Thread via GitHub
sidshehria commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2674996952 @timsaucer Yes, kind of some solutions I have in my mind Kindly review them, **1. Higher-Level Abstractions:** - Introduce a DataFrame-like API that feels m

Re: [PR] fix: we are missing the unlimited case for bounded streaming when usi… [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas commented on PR #14815: URL: https://github.com/apache/datafusion/pull/14815#issuecomment-2674993678 cc @alamb @2010YOUY01 Found one bug, please help review, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Chore/Add additional FFI unit tests [datafusion]

2025-02-21 Thread via GitHub
timsaucer merged PR #14802: URL: https://github.com/apache/datafusion/pull/14802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-21 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1965798308 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[I] Datafusion-cli, when the max rows setting inf, we are missing the unlimited case for bounded streaming. [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas opened a new issue, #14814: URL: https://github.com/apache/datafusion/issues/14814 ### Describe the bug https://github.com/apache/datafusion/pull/14766 After above improvement, we improved the datafusion-cli memory usage and memory reservation, but we forgot one cas

  1   2   >