kevinjqliu commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2816875067
> In order to try and make progress on this, I decided to go with having a
single function that builds all tables for a single scale factor similar to how
DuckDB does it. My
appletreeisyellow commented on code in PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051594000
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool {
// For example, casts from string to
adriangb opened a new issue, #15780:
URL: https://github.com/apache/datafusion/issues/15780
### Describe the bug
Consider the following test:
```sql
COPY (
SELECT arrow_cast(a, 'Int16') AS a
FROM ( VALUES (1), (2), (3) ) AS t(a)
) TO 'test_files/scratch/parquet
adriangb opened a new pull request, #15779:
URL: https://github.com/apache/datafusion/pull/15779
(no comment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe
Weijun-H commented on code in PR #15773:
URL: https://github.com/apache/datafusion/pull/15773#discussion_r2051622969
##
benchmarks/queries/clickbench/README.md:
##
@@ -155,7 +155,7 @@ WHERE
THEN split_part(split_part("URL", 'resolution=', 2), '&', 1)::INT
ELSE
kumarlokesh commented on issue #14512:
URL: https://github.com/apache/datafusion/issues/14512#issuecomment-2816854617
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
adriangb commented on PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816927115
@etseidl I just pushed a change that does what I think you suggested and
only denies a cast between a string and a non string type but otherwise assumes
that the general casting rul
adriangb commented on PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816927715
Marking as ready for review despite not having any numbers to substantiate
performance improvement (because we need #15769) given that algorithmically and
from experience in the pre
adriangb commented on code in PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#discussion_r2051611694
##
datafusion/core/src/datasource/physical_plan/parquet.rs:
##
@@ -148,7 +149,7 @@ mod tests {
let mut source = ParquetSource::default();
etseidl commented on PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816928597
Thanks @adriangb. I'm testing now...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
adriangb commented on code in PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#discussion_r2051614507
##
datafusion/sqllogictest/test_files/aggregate.slt:
##
@@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5;
query I??
SELECT column5,
etseidl commented on PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816941931
Yep, fixes my issue. Thanks again!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
adriangb commented on PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2816943989
> Thanks @adriangb -- I am about to be offline for a week so I will review
this when I return
Enjoy your vacation! I think you'll like this diff:
https://github.com/use
adriangb commented on PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816945357
Given approvals and relatively simple change, could a committer merge this
please?
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
chenkovsky closed pull request #15750: fix: describe Parquet schema with
coerce_int96
URL: https://github.com/apache/datafusion/pull/15750
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
xudong963 commented on PR #15644:
URL: https://github.com/apache/datafusion/pull/15644#issuecomment-2816727289
> thanks @kumarlokesh The code now looks much more aligned.
>
> perhaps we can factor out
>
> ```
> self.parser
>
appletreeisyellow commented on PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816729397
@adriangb Yes, I'm happy to review. I'll have some time to review it this
afternoon or evening
--
This is an automated message from the Apache Git Service.
To respond to
xudong963 commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051494486
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -233,11 +233,11 @@ impl BatchPartitioner {
///
/// The time spent repartitioning, not includi
xudong963 commented on issue #15628:
URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816732058
> What I suggest is that someone updates our documentation with the current
state of joins in DataFusion (namely what operators are implemented and what
types of joins they ar
kumarlokesh opened a new pull request, #15776:
URL: https://github.com/apache/datafusion/pull/15776
## Which issue does this PR close?
- Closes #13721.
## Rationale for this change
## What changes are included in this PR?
- Created a new utility
Adez017 commented on issue #15774:
URL: https://github.com/apache/datafusion/issues/15774#issuecomment-2816779407
HI @xudong963 , i want to work on this , but coul you clarify what exactly
we need to do
--
This is an automated message from the Apache Git Service.
To respond to the mes
Adez017 opened a new issue, #15777:
URL: https://github.com/apache/datafusion/issues/15777
Some of the examples are there for the Aggregate functions in The docs but
most are remaining.
Want that exmaple should be added for rest
--
This is an automated message from the Apache Git Servi
Adez017 commented on issue #15777:
URL: https://github.com/apache/datafusion/issues/15777#issuecomment-2816783180
I want to deal with it personally
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
Adez017 commented on issue #15777:
URL: https://github.com/apache/datafusion/issues/15777#issuecomment-2816783217
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To un
james-ryans commented on issue #15780:
URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2816972256
I would love to work on this task
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
PokIsemaine commented on code in PR #15772:
URL: https://github.com/apache/datafusion/pull/15772#discussion_r2051632953
##
datafusion/expr/src/expr.rs:
##
@@ -701,6 +701,24 @@ impl TryCast {
}
}
+/// OrderBy Expressions
+pub enum OrderByExprs {
+OrderByExprVec(Vec),
adriangb commented on issue #15780:
URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2816990712
I can confirm this is currently being done at the LogicalPlan level. I'd say
the first step is to understand how it happens there and then if something
similar exists for Physi
alamb merged PR #15468:
URL: https://github.com/apache/datafusion/pull/15468
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on issue #11336:
URL: https://github.com/apache/datafusion/issues/11336#issuecomment-2816668707
Thank you @Adez017 -- I would personally suggest starting by porting some
of the contents of https://jorgecarleitao.github.io/arrow2/main/guide/ into
DataFUsion's docs
In
alamb commented on code in PR #15708:
URL: https://github.com/apache/datafusion/pull/15708#discussion_r2051458845
##
docs/source/user-guide/sql/format_options.md:
##
@@ -0,0 +1,209 @@
+
+
+# Format Options
+
+DataFusion supports customizing how data is read from or written to di
alamb merged PR #15735:
URL: https://github.com/apache/datafusion/pull/15735
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb commented on PR #15735:
URL: https://github.com/apache/datafusion/pull/15735#issuecomment-2816669049
Thanks @xudong963
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
alamb merged PR #1815:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1815
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr.
alamb commented on issue #15628:
URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816665497
TLDR while this is straight forward bug report, I think fixing it is not
not something we are going to make a patch for -- it will require a more
serious implementation effort fo
alamb commented on PR #15769:
URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2816667139
Thanks @adriangb -- I am about to be offline for a week so I will review
this when I return
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
adriangb commented on issue #15742:
URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816667473
Funny enough I just opened https://github.com/apache/datafusion/pull/15764
without having seen this issue!
It sounds like there may be some complexity with floats... hone
alamb merged PR #15708:
URL: https://github.com/apache/datafusion/pull/15708
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
PokIsemaine opened a new pull request, #15772:
URL: https://github.com/apache/datafusion/pull/15772
## Which issue does this PR close?
- None
## Rationale for this change
- https://github.com/apache/datafusion/issues/14514
## What changes are includ
comphead merged PR #15750:
URL: https://github.com/apache/datafusion/pull/15750
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
comphead closed issue #15721: When `datafusion.execution.parquet.coerce_int96`
is set, timestamp type is still reported as Timestamp(nanoseconds)
URL: https://github.com/apache/datafusion/issues/15721
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
adriangb commented on code in PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051557860
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool {
// For example, casts from string to numbers
Dandandan commented on PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816688267
> @Dandandan I believe with this setup we should be able to achieve with a
couple LOC in `insert_batch`:
>
> ```rust
> // Apply the filter to the batch before processing
alamb commented on issue #15628:
URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816753297
I can try and help this effort in a few weeks, but I need to finish up the
filter pushdown work in parquet and topk work first (I have too many things
outstanding at the moment an
alamb commented on code in PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051508492
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool {
// For example, casts from string to numbers is
alamb commented on code in PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051508562
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -1544,7 +1572,10 @@ fn build_predicate_expression(
Ok(builder) => builder,
// allow partial
xudong963 commented on code in PR #15744:
URL: https://github.com/apache/datafusion/pull/15744#discussion_r2051509268
##
datafusion/optimizer/src/push_down_limit.rs:
##
@@ -137,6 +142,9 @@ impl OptimizerRule for PushDownLimit {
}
} else {
chenkovsky opened a new pull request, #15773:
URL: https://github.com/apache/datafusion/pull/15773
## Which issue does this PR close?
- Closes #15753.
## Rationale for this change
column types of UTMSource and UTMCampaign in clickbench_partitioned are
binary, but in
chenkovsky commented on issue #15765:
URL: https://github.com/apache/datafusion/issues/15765#issuecomment-2816756469
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
xudong963 commented on PR #15503:
URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2816757906
I may start a new branch based on the branch to experiment with
@berkaysynnada's suggestion to see if there are some challenges next week. /cc
@alamb @suremarc @wiedld (Hope we ca
chenkovsky commented on issue #15755:
URL: https://github.com/apache/datafusion/issues/15755#issuecomment-2816757278
take
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
alamb commented on code in PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051510677
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool {
// For example, casts from string to numbers is
alamb commented on issue #15742:
URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816758476
- While reviewing https://github.com/apache/datafusion/pull/15764 it wasn't
clear to me why we are checking casting / types at all in the pruning code.
I think that might
alamb commented on PR #15749:
URL: https://github.com/apache/datafusion/pull/15749#issuecomment-2816763464
> Thank you @alamb
>
> After merging this one, if I have a chance on the weekend, I'll add
something other to the guide.
Thank you very much @xudong963
--
This is an a
xudong963 opened a new issue, #15775:
URL: https://github.com/apache/datafusion/issues/15775
I plan to speed up the logical optimizer from two aspects:
1. Try to reduce the optimization rounds by making each rule generate the
best plan as much as possible
2. Optimize the single rule's
alamb commented on issue #15512:
URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2816765297
> Currently, q23 takes approximately 6 seconds to execute. I have confirmed
that DataFusion does not have the aforementioned optimizations and still scans
a very large number of r
alamb closed issue #6781: Blog post about user defined window functions
URL: https://github.com/apache/datafusion/issues/6781
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
alamb merged PR #66:
URL: https://github.com/apache/datafusion-site/pull/66
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusio
alamb commented on PR #66:
URL: https://github.com/apache/datafusion-site/pull/66#issuecomment-2816641318
The blog is now live:
https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions/
--
This is an automated message from the Apache Git Service.
To respond to the messa
alamb commented on PR #66:
URL: https://github.com/apache/datafusion-site/pull/66#issuecomment-2816641235
Thanks again @Adez017 @Dandandan @getChan and @comphead
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
adriangb commented on code in PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#discussion_r2051461834
##
datafusion/physical-expr/src/expressions/mod.rs:
##
@@ -22,7 +22,7 @@ mod binary;
mod case;
mod cast;
mod column;
-mod dynamic_filters;
+pub mod dynamic_fil
andygrove merged PR #1657:
URL: https://github.com/apache/datafusion-comet/pull/1657
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@
adriangb commented on PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816673274
> Pausing this until #15769 is done
I was able to unblock by wiring up to TestDataSource
--
This is an automated message from the Apache Git Service.
To respond to the messa
andygrove merged PR #1655:
URL: https://github.com/apache/datafusion-comet/pull/1655
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@
adriangb commented on PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816680541
@Dandandan I believe with this setup we should be able to achieve with a
couple LOC in `insert_batch`:
```rust
// Apply the filter to the batch before processing
let fil
etseidl commented on issue #15742:
URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816792398
I'll follow up in #15764
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spec
etseidl commented on code in PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051538025
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool {
// For example, casts from string to numbers i
Adez017 opened a new pull request, #15778:
URL: https://github.com/apache/datafusion/pull/15778
## Which issue does this PR close?
- Closes #15777
## Rationale for this change
Added Example for all the Aggregate funcitons provided in docs under `window
funcitons`
etseidl commented on PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816794323
Thanks @adriangb! I think if we clean up the pruning allowed types this can
close #15742.
We can then tackle the special handling for floats later. They're already
wrong for n
alamb opened a new issue, #15771:
URL: https://github.com/apache/datafusion/issues/15771
### Is your feature request related to a problem or challenge?
_No response_
### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
alamb commented on issue #15072:
URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2816650357
I filed the following ticket for the next release:
- https://github.com/apache/datafusion/issues/15771
--
This is an automated message from the Apache Git Service.
To respond
etseidl commented on code in PR #15764:
URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051538025
##
datafusion/physical-optimizer/src/pruning.rs:
##
@@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool {
// For example, casts from string to numbers i
alamb commented on code in PR #15646:
URL: https://github.com/apache/datafusion/pull/15646#discussion_r2051449542
##
datafusion/common/src/dfschema.rs:
##
@@ -969,16 +969,28 @@ impl Display for DFSchema {
/// widely used in the DataFusion codebase.
pub trait ExprSchema: std::f
alamb closed issue #15762: `Cargo bench --bench sql_planner` is failing
URL: https://github.com/apache/datafusion/issues/15762
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
alamb commented on issue #15762:
URL: https://github.com/apache/datafusion/issues/15762#issuecomment-2816657782
I believe this is a duplicate of
- https://github.com/apache/datafusion/issues/15753
--
This is an automated message from the Apache Git Service.
To respond to the message, p
xudong963 commented on PR #15766:
URL: https://github.com/apache/datafusion/pull/15766#issuecomment-2816770079
There is a picture that I drew before, maybe we can convert it into doc

--
This i
milenkovicm opened a new pull request, #1249:
URL: https://github.com/apache/datafusion-ballista/pull/1249
# Which issue does this PR close?
Closes None.
# Rationale for this change
Adding few more tests to cover writer/read round trip
# What changes are included
iffyio commented on code in PR #1809:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2051416091
##
src/parser/mod.rs:
##
@@ -4055,6 +4070,44 @@ impl<'a> Parser<'a> {
)
}
+/// Look backwards in the token stream and expect that th
alamb commented on issue #15742:
URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816659212
> It would be nice if Datafusion always used statistics for floating point
columns if they are available. One potential fix is to add more cases to
verify_support_type_for_prune (
alamb commented on issue #15742:
URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816659245
FYI @adriangb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment
xudong963 commented on code in PR #15772:
URL: https://github.com/apache/datafusion/pull/15772#discussion_r2051496431
##
datafusion/expr/src/expr.rs:
##
@@ -701,6 +701,24 @@ impl TryCast {
}
}
+/// OrderBy Expressions
+pub enum OrderByExprs {
+OrderByExprVec(Vec),
+
Dandandan commented on code in PR #15768:
URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051497967
##
datafusion/physical-plan/src/repartition/mod.rs:
##
@@ -233,11 +233,11 @@ impl BatchPartitioner {
///
/// The time spent repartitioning, not includi
adriangb commented on code in PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#discussion_r2051499380
##
datafusion/core/tests/physical_optimizer/push_down_filter/mod.rs:
##
@@ -0,0 +1,401 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or mo
Adez017 commented on PR #15778:
URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2816830201
@alamb need your help with the same issue that i faced in previus PR .
i had tried running
```
/dev/update_function_docs.sh
```
but i think there is something left o
83 matches
Mail list logo