berkaysynnada commented on code in PR #10404:
URL: https://github.com/apache/datafusion/pull/10404#discussion_r1597969687
##
datafusion/core/src/datasource/stream.rs:
##
@@ -58,12 +58,22 @@ impl TableProviderFactory for StreamTableFactory {
let schema: SchemaRef = Arc::
Smotrov opened a new issue, #10478:
URL: https://github.com/apache/datafusion/issues/10478
### Describe the bug
I'm trying to read 6GB table of compressed NDJSON data from S3. The data is
compressed with ZStd with about x100 compression ratio. Files are stored HIVE
partitioned and ha
alamb commented on code in PR #10117:
URL: https://github.com/apache/datafusion/pull/10117#discussion_r1598189334
##
datafusion/physical-expr/src/scalar_function.rs:
##
@@ -251,10 +251,23 @@ pub fn out_ordering(
func: &FuncMonotonicity,
arg_orderings: &[SortProperties]
alamb commented on PR #10454:
URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2107127069
> Very hard to get consistent benchmark results on a personal computer when
there's so much process scheduling noise
Yeah, I have a gcp VM running on which I run the benchmarks
alamb commented on PR #10466:
URL: https://github.com/apache/datafusion/pull/10466#issuecomment-2107218784
Thanks @AbrarNitk ! I started the CI checks on this PR
Normally I think we should add some test coverage so we don't accidentally
break this in the future.
This would I
alamb commented on issue #10465:
URL: https://github.com/apache/datafusion/issues/10465#issuecomment-2107222678
Great idea @ClSlaid
Thanks to @AbrarNitk we have a first version of
`LogicalPlanBuilder::from(arc_input)` in
https://github.com/apache/datafusion/pull/10466 🙏
I th
alamb closed issue #10414: DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax:
`DELIMITER`, `WITH HEADER ROW` and `COMPRESSION`
URL: https://github.com/apache/datafusion/issues/10414
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitH
alamb commented on PR #10404:
URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2107232010
Thanks again for this work @berkaysynnada and the guidance @ozankabak
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu
alamb merged PR #10404:
URL: https://github.com/apache/datafusion/pull/10404
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb closed issue #9945: Overwritten Format Configs by CreateExternalTable
Options
URL: https://github.com/apache/datafusion/issues/9945
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
alamb opened a new issue, #10479:
URL: https://github.com/apache/datafusion/issues/10479
### Is your feature request related to a problem or challenge?
As part of governing DataFusion in the open and via the Apache Way, we
should make sure that as much as possible is done in the open.
alamb opened a new issue, #10480:
URL: https://github.com/apache/datafusion/issues/10480
### Is your feature request related to a problem or challenge?
@JayjeetAtGithub @Dandandan @yjshen @ozankabak @sunchao and @viirya wrote
and submitted a paper to the [SIGMOD 2024 conference](
alamb opened a new issue, #10481:
URL: https://github.com/apache/datafusion/issues/10481
I am giving an invited keynote talk at a workshop colocated with SIGMOD 2024
on Friday Jun 14, 2024 (after the main conference).
I need to prepare slides for this and figured people in th
alamb commented on issue #10481:
URL: https://github.com/apache/datafusion/issues/10481#issuecomment-2107276815
Here are some notes I have on what I want to talk about
interfaces and then paradoxically allowed us to narrow the scope of
potential optimizations (e.g. compute kernels) an
alamb opened a new issue, #10482:
URL: https://github.com/apache/datafusion/issues/10482
Follow on to https://github.com/apache/datafusion/issues/10395
My (personal) North ⭐ : 1000 projects are built using DataFusion 📈
**It would be great for other contributors to DataFusion wh
alamb commented on issue #10395:
URL: https://github.com/apache/datafusion/issues/10395#issuecomment-2107312167
Next week: https://github.com/apache/datafusion/issues/10482
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
alamb closed issue #10395: DataFusion weekly project plan (Andrew Lamb) - May
6, 2024
URL: https://github.com/apache/datafusion/issues/10395
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
alamb commented on issue #9929:
URL: https://github.com/apache/datafusion/issues/9929#issuecomment-2107313160
I hope to work on this issue this week
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
alamb commented on PR #10474:
URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107373702
@jonahgao perhaps you would like to try merging this PR as a test that we
have all the permissions setup correctly ?
--
This is an automated message from the Apache Git Service.
To
alamb merged PR #10354:
URL: https://github.com/apache/datafusion/pull/10354
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
alamb closed issue #9526: Add a `AggregateUDFImpl::simplfy()` API
URL: https://github.com/apache/datafusion/issues/9526
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsub
alamb commented on PR #10354:
URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2107376306
Thanks again @jayzhan211 and @milenkovicm 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
appletreeisyellow commented on issue #10295:
URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107378437
I'd like to take this one if no one has worked on it yet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitH
alamb commented on issue #10295:
URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107381824
Thanks @appletreeisyellow -- that would be great. No one has started as far
as I can tell.
--
This is an automated message from the Apache Git Service.
To respond to the
alamb commented on issue #10295:
URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107386884
I took a quick look at
https://github.com/apache/datafusion/blob/58cc4e1289451b30adca4721fd6eb5a36b26a2cd/datafusion/optimizer/src/single_distinct_to_groupby.rs#L59
Looks to
appletreeisyellow commented on issue #10295:
URL: https://github.com/apache/datafusion/issues/10295#issuecomment-2107390722
@alamb Thank you for the guidance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
ozankabak opened a new pull request, #10483:
URL: https://github.com/apache/datafusion/pull/10483
## Which issue does this PR close?
Quick follow-on to #10404.
## Rationale for this change
Now that we migrated to a consistent options syntax for external tables, we
should
jonahgao commented on PR #10474:
URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107415886
> @jonahgao perhaps you would like to try merging this PR as a test that we
have all the permissions setup correctly ?
Sure.
--
This is an automated message from the Apache
ozankabak commented on code in PR #10483:
URL: https://github.com/apache/datafusion/pull/10483#discussion_r1598371311
##
datafusion/sql/src/parser.rs:
##
@@ -462,7 +462,18 @@ impl<'a> DFParser<'a> {
pub fn parse_option_key(&mut self) -> Result {
let next_token = se
jonahgao merged PR #10474:
URL: https://github.com/apache/datafusion/pull/10474
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
jonahgao closed issue #10464: bug: `CAST()` causes internal error
URL: https://github.com/apache/datafusion/issues/10464
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsu
jonahgao commented on PR #10474:
URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107424445
Thanks @viirya and thank you @alamb for letting me experience this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
jayzhan211 commented on code in PR #10469:
URL: https://github.com/apache/datafusion/pull/10469#discussion_r1598377661
##
datafusion/functions-array/src/macros.rs:
##
@@ -106,4 +105,26 @@ macro_rules! make_udf_function {
}
}
};
+// This pattern doe
jayzhan211 commented on code in PR #10469:
URL: https://github.com/apache/datafusion/pull/10469#discussion_r1598379495
##
datafusion/proto/tests/cases/roundtrip_logical_plan.rs:
##
@@ -581,7 +581,7 @@ async fn roundtrip_expr_api() -> Result<()> {
make_array(vec
and pick the appropriate default. To close, I think at a minimum we would want
(1) a unit test
leoluan2009 opened a new issue, #419:
URL: https://github.com/apache/datafusion-comet/issues/419
### What is the problem the feature request solves?
Support spark base64 function
### Describe the potential solution
_No response_
### Additional context
_No re
andygrove commented on code in PR #666:
URL: https://github.com/apache/datafusion-python/pull/666#discussion_r1598526974
##
examples/tpch/convert_data_to_parquet.py:
##
@@ -0,0 +1,142 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license
leoluan2009 opened a new pull request, #420:
URL: https://github.com/apache/datafusion-comet/pull/420
## Which issue does this PR close?
Closes #419 .
## Rationale for this change
## What changes are included in this PR?
## How are these cha
andygrove merged PR #666:
URL: https://github.com/apache/datafusion-python/pull/666
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@d
andygrove closed issue #440: [DISCUSSION] We need a Hero for datafusion-python
URL: https://github.com/apache/datafusion-python/issues/440
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
leoluan2009 commented on PR #420:
URL: https://github.com/apache/datafusion-comet/pull/420#issuecomment-2107656632
@viirya Help to start CI, thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
timsaucer commented on issue #688:
URL:
https://github.com/apache/datafusion-python/issues/688#issuecomment-2107665680
Based on recommendation on discord, we may want to use `WindowFrame::new()`
which has the logic already for checking if order_by exists.
--
This is an automated message
jayzhan211 commented on code in PR #10446:
URL: https://github.com/apache/datafusion/pull/10446#discussion_r1598536147
##
datafusion/expr/src/utils.rs:
##
@@ -1107,20 +1107,49 @@ fn split_binary_impl<'a>(
/// assert_eq!(conjunction(split), Some(expr));
/// ```
pub fn conjunct
jonahgao commented on code in PR #10469:
URL: https://github.com/apache/datafusion/pull/10469#discussion_r1598536574
##
datafusion/functions-array/src/macros.rs:
##
@@ -106,4 +105,26 @@ macro_rules! make_udf_function {
}
}
};
+// This pattern does
bellwether-softworks opened a new issue, #10486:
URL: https://github.com/apache/datafusion/issues/10486
### Describe the bug
Beginning in v37.0.0, a previously-working query is found to result in a
panic:
```
panicked at
/Users/username/.cargo/registry/src/index.crates.io-6
iiiancampbell opened a new pull request, #10487:
URL: https://github.com/apache/datafusion/pull/10487
Converted internal representation of LogicalPlanBuilder from LogicalPlan to
Arc #10485
## Which issue does this PR close?
Closes #10485 .
## Are these changes te
jayzhan211 commented on issue #10486:
URL: https://github.com/apache/datafusion/issues/10486#issuecomment-2107714207
I added the assertion because I don't know if there is any case that has len
> 1. After moving the assertion, I think it should work as usual.
It would be nice if you h
viirya commented on PR #10474:
URL: https://github.com/apache/datafusion/pull/10474#issuecomment-2107725114
Thank you @alamb @jonahgao
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specifi
bellwether-softworks commented on issue #10486:
URL: https://github.com/apache/datafusion/issues/10486#issuecomment-2107727362
@jayzhan211 I appreciate your concern regarding the complex example case; I
attempted to create a simpler contrived example, but was unable to trigger the
panic doi
viirya closed issue #417: chore: Rename some columnar shuffle configs for code
consistently
URL: https://github.com/apache/datafusion-comet/issues/417
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
viirya commented on PR #418:
URL: https://github.com/apache/datafusion-comet/pull/418#issuecomment-2107737430
Thanks @leoluan2009 @andygrove
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
viirya merged PR #418:
URL: https://github.com/apache/datafusion-comet/pull/418
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
jayzhan211 commented on issue #10486:
URL: https://github.com/apache/datafusion/issues/10486#issuecomment-2107751222
> @jayzhan211 I appreciate your concern regarding the complex example case;
I attempted to create a simpler contrived example, but was unable to trigger
the panic doing so. I
comphead commented on PR #10304:
URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2107801927
@viirya @alamb can I get a review on this PR please?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
comphead commented on code in PR #10483:
URL: https://github.com/apache/datafusion/pull/10483#discussion_r1598606826
##
datafusion/sql/src/parser.rs:
##
@@ -462,7 +462,18 @@ impl<'a> DFParser<'a> {
pub fn parse_option_key(&mut self) -> Result {
let next_token = sel
comphead merged PR #10446:
URL: https://github.com/apache/datafusion/pull/10446
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
comphead merged PR #10452:
URL: https://github.com/apache/datafusion/pull/10452
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@dataf
comphead commented on code in PR #10431:
URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598621244
##
datafusion/optimizer/src/eliminate_cross_join.rs:
##
@@ -39,102 +41,147 @@ impl EliminateCrossJoin {
}
}
-/// Attempt to reorder join to eliminate cros
comphead commented on code in PR #10431:
URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598631511
##
datafusion/optimizer/src/eliminate_cross_join.rs:
##
@@ -144,49 +191,89 @@ impl OptimizerRule for EliminateCrossJoin {
}
}
+fn rewrite_children(
+o
comphead commented on code in PR #10431:
URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598634008
##
datafusion/optimizer/src/eliminate_cross_join.rs:
##
@@ -39,102 +41,147 @@ impl EliminateCrossJoin {
}
}
-/// Attempt to reorder join to eliminate cros
comphead commented on code in PR #10431:
URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598634408
##
datafusion/optimizer/src/eliminate_cross_join.rs:
##
@@ -39,102 +41,147 @@ impl EliminateCrossJoin {
}
}
-/// Attempt to reorder join to eliminate cros
comphead commented on code in PR #10431:
URL: https://github.com/apache/datafusion/pull/10431#discussion_r1598637244
##
datafusion/optimizer/src/eliminate_cross_join.rs:
##
@@ -39,102 +41,147 @@ impl EliminateCrossJoin {
}
}
-/// Attempt to reorder join to eliminate cros
Michael-J-Ward commented on PR #10469:
URL: https://github.com/apache/datafusion/pull/10469#issuecomment-2107941566
@jayzhan211, Should I follow this as a template for making UDF arguments
optional?
--
This is an automated message from the Apache Git Service.
To respond to the message, p
alamb commented on code in PR #10446:
URL: https://github.com/apache/datafusion/pull/10446#discussion_r1598656772
##
datafusion/expr/src/utils.rs:
##
@@ -1107,20 +1107,49 @@ fn split_binary_impl<'a>(
/// assert_eq!(conjunction(split), Some(expr));
/// ```
pub fn conjunction(f
timsaucer commented on issue #688:
URL:
https://github.com/apache/datafusion-python/issues/688#issuecomment-2107973643
From @Michael-J-Ward
> Doing a little archaelogy on that:
>
> This is the PR where window_frame switched from None to
WindowFrame::new(order_by.is_some());
alamb commented on PR #10304:
URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2107980838
I will review this today
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
alamb commented on issue #440:
URL:
https://github.com/apache/datafusion-python/issues/440#issuecomment-2107989366
I think github got a little excited about closing this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and us
andygrove commented on code in PR #383:
URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1598662597
##
core/src/execution/datafusion/expressions/cast.rs:
##
@@ -1444,13 +1483,136 @@ fn parse_str_to_time_only_timestamp(value: &str) ->
CometResult> {
Ok(S
alamb opened a new issue, #440:
URL: https://github.com/apache/datafusion-python/issues/440
## What this project could be
I think this project needs someone who wants to make a world class python
dataframe library and user experience take the helm. I will argue why I think
this is a
alamb commented on issue #10485:
URL: https://github.com/apache/datafusion/issues/10485#issuecomment-2107993007
Awesome -- thank you @iiiancampbell 🎉
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
alamb commented on PR #10487:
URL: https://github.com/apache/datafusion/pull/10487#issuecomment-2107995895
Thanks @iiiancampbell ! I triggered the CI to start
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL ab
viirya commented on PR #10304:
URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2108010944
I'll take another look today.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
andygrove commented on code in PR #383:
URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1598670410
##
spark/src/test/scala/org/apache/comet/CometCastSuite.scala:
##
@@ -563,9 +563,33 @@ class CometCastSuite extends CometTestBase with
AdaptiveSparkPlanHelper
1 - 100 of 227 matches
Mail list logo