Jimexist opened a new pull request #715:
URL: https://github.com/apache/arrow-datafusion/pull/715
# Which issue does this PR close?
Closes #673
# Rationale for this change
# What changes are included in this PR?
# Are there any user-facing changes?
p5a0u9l edited a comment on issue #10531:
URL: https://github.com/apache/arrow/issues/10531#issuecomment-877982140
I mean, ok @wesm. Super appreciative of all you and contributors create
here, but I rely on Plasma for a production system.
An in-memory object store that supports
p5a0u9l commented on issue #10531:
URL: https://github.com/apache/arrow/issues/10531#issuecomment-877982140
I mean, ok @wesm. Super appreciative of all you and contributors create
here, but I rely on Plasma for a production system.
An in-memory object store that supports hugepages,
Jimexist commented on pull request #704:
URL: https://github.com/apache/arrow-datafusion/pull/704#issuecomment-877979537
can you remove the intoiter wrapping since array has it implemented by
default?
--
This is an automated message from the Apache Git Service.
To respond to the
cyb70289 commented on a change in pull request #10686:
URL: https://github.com/apache/arrow/pull/10686#discussion_r667592918
##
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##
@@ -1042,6 +1042,26 @@ TEST(TestUnaryArithmetic, DispatchBest) {
for
cyb70289 commented on a change in pull request #10690:
URL: https://github.com/apache/arrow/pull/10690#discussion_r667586087
##
File path: docker-compose.yml
##
@@ -62,6 +62,14 @@ x-ccache:
CCACHE_MAXSIZE: 500M
CCACHE_DIR: /ccache
+# CPU/memory limits to pass to
silathdiir commented on pull request #537:
URL: https://github.com/apache/arrow-rs/pull/537#issuecomment-877912374
Hi @alamb, refer to @jorgecarleitao latest comment [#461
(comment)](https://github.com/apache/arrow-rs/issues/461#issuecomment-877808997),
it seems that
codecov-commenter edited a comment on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877857352
#
bkietz commented on pull request #10691:
URL: https://github.com/apache/arrow/pull/10691#issuecomment-877881536
I recommend avoiding macros whenever possible, but for completeness I'll
note that we could replace
```c++
// color.h
struct Color : EnumType {
using
houqp commented on issue #705:
URL:
https://github.com/apache/arrow-datafusion/issues/705#issuecomment-877873156
That's a good point. Perhaps the better structure would be to push optimize
into query evaluation/materialization methods like `save` and `collect`. This
way, we can make sure
andygrove commented on a change in pull request #714:
URL: https://github.com/apache/arrow-datafusion/pull/714#discussion_r667545951
##
File path: ballista/rust/core/src/execution_plans/shuffle_writer.rs
##
@@ -254,13 +254,13 @@ impl ExecutionPlan for ShuffleWriterExec {
andygrove opened a new pull request #714:
URL: https://github.com/apache/arrow-datafusion/pull/714
# Which issue does this PR close?
Closes #713 .
# Rationale for this change
Simple bug fix. Wrong variable was being used for partition number.
# What
andygrove opened a new issue #713:
URL: https://github.com/apache/arrow-datafusion/issues/713
**Describe the bug**
ShuffleWriterExec writes all output partitions to the same file due to a
simple bug where the wrong variable is used for the partition number.
**To Reproduce**
Dandandan commented on issue #705:
URL:
https://github.com/apache/arrow-datafusion/issues/705#issuecomment-877869950
I think it's not an oversight: `collect` is not necessarily called when
executing queries, for example when writing the results to disk. As `collect`
loads the result into
github-actions[bot] commented on pull request #10697:
URL: https://github.com/apache/arrow/pull/10697#issuecomment-877865260
https://issues.apache.org/jira/browse/ARROW-13304
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
rok opened a new pull request #10697:
URL: https://github.com/apache/arrow/pull/10697
To address [ARROW-13304](https://issues.apache.org/jira/browse/ARROW-13304).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
jhorstmann commented on pull request #389:
URL: https://github.com/apache/arrow-rs/pull/389#issuecomment-877864847
This looks good and makes sense to me. StructArray (and probably union too)
are a bit of a special case in how offsets are propagated. Since we currently
do not push down
bryantbiggs commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877862396
nice!!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
pachadotdev commented on a change in pull request #10624:
URL: https://github.com/apache/arrow/pull/10624#discussion_r667533252
##
File path: r/R/dplyr-functions.R
##
@@ -280,6 +280,45 @@ nse_funcs$str_trim <- function(string, side = c("both",
"left", "right")) {
Dandandan commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877858042
> Maybe we do need two separate configs after all as per @Dandandan PR.
Maybe with concurrency renamed to something more specific to readers.
For now, something
github-actions[bot] commented on pull request #10696:
URL: https://github.com/apache/arrow/pull/10696#issuecomment-877858022
https://issues.apache.org/jira/browse/ARROW-13303
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
domoritz opened a new pull request #10696:
URL: https://github.com/apache/arrow/pull/10696
Merge after #10673
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
codecov-commenter commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877857352
#
Dandandan commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877857197
> Maybe we do need two separate configs after all as per @Dandandan PR.
Maybe with concurrency renamed to something more specific to readers.
I believe that's
andygrove commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877856405
Maybe we do need two separate configs after all as per @Dandandan PR. Maybe
with concurrency renamed to something more specific to readers.
--
This is an automated
kou merged pull request #10694:
URL: https://github.com/apache/arrow/pull/10694
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
nevi-me commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877855410
Might be of interest to @xrl and @bryantbiggs as you were the original
authors
--
This is an automated message from the Apache Git Service.
To respond to the message, please
nevi-me opened a new pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539
# Which issue does this PR close?
None
# Rationale for this change
Users of `parquet_derive` currently have to write the schema of their Rust
struct by hand.
This is
jorgecarleitao commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877851471
> What's the ideal design going forward? If the end goal is to create one
async task for each partition and let tokio thread pool to manage the
parallelism
andygrove commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877850054
@Dandandan Ah, ok. I had not understood fully what was happening there. I
think the goal is to have one async task per partition and to remove the
spawn_blocking, as
houqp commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877847808
What's the ideal design going forward? If the end goal is to create one
async task for each partition and let tokio thread pool to manage the
parallelism (threads), then I
Dandandan commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877844585
> > One concern I have is that the current config also sets the number of
maximum threads during reading parquet files.
>
> Is this still true though? I know we
andygrove commented on pull request #709:
URL: https://github.com/apache/arrow-datafusion/pull/709#issuecomment-877843644
@edrevo fyi
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
andygrove commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877843494
> One concern I have is that the current config also sets the number of
maximum threads during reading parquet files.
Is this still true though? I know we were
andygrove opened a new pull request #712:
URL: https://github.com/apache/arrow-datafusion/pull/712
# Which issue does this PR close?
Closes #710 .
# Rationale for this change
We need this to complete the shuffle implementation.
# What changes are
andygrove opened a new issue #711:
URL: https://github.com/apache/arrow-datafusion/issues/711
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
We cannot fix the shuffle mechanism until we have partition stats, or
andygrove opened a new issue #710:
URL: https://github.com/apache/arrow-datafusion/issues/710
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
We can't get shuffle writes working correctly until we implement serde for
this
Dandandan edited a comment on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877831879
I think this is mostly good.
One concern I have is that the current config also sets the number of
maximum threads during reading parquet files.
houqp commented on pull request #704:
URL: https://github.com/apache/arrow-datafusion/pull/704#issuecomment-877836703
@jorgecarleitao yes, this is a relatively new API that landed in stable rust
1.51.
--
This is an automated message from the Apache Git Service.
To respond to the
Dandandan commented on pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#issuecomment-877831879
I think this is mostly good.
One concern I have is that the current config also sets the number of
maximum threads during reading parquet files.
For
Dandandan commented on a change in pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706#discussion_r667508175
##
File path: datafusion/src/logical_plan/builder.rs
##
@@ -147,10 +147,10 @@ impl LogicalPlanBuilder {
pub fn scan_parquet_with_name(
andygrove opened a new pull request #709:
URL: https://github.com/apache/arrow-datafusion/pull/709
# Which issue does this PR close?
Closes #707 .
# Rationale for this change
Shuffles were broken. The executor always ran the shuffle writes with
partioning
andygrove commented on issue #707:
URL:
https://github.com/apache/arrow-datafusion/issues/707#issuecomment-877826983
It turns out there is a fundamental bug here. I am working on fixing this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please
andygrove commented on pull request #683:
URL: https://github.com/apache/arrow-datafusion/pull/683#issuecomment-877817032
Please take a look at my proposal in
https://github.com/apache/arrow-datafusion/pull/706
--
This is an automated message from the Apache Git Service.
To respond to
andygrove opened a new issue #708:
URL: https://github.com/apache/arrow-datafusion/issues/708
**Describe the bug**
```rust
LogicalPlanBuilder::scan_parquet_with_name(
,
projection,
24,
_name,
)? //TODO concurrency
```
--
This is an automated
andygrove opened a new issue #707:
URL: https://github.com/apache/arrow-datafusion/issues/707
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
When running TPC-H query 12 I see query plans like this being executed:
```
andygrove opened a new pull request #706:
URL: https://github.com/apache/arrow-datafusion/pull/706
# Which issue does this PR close?
Closes #685 .
# Rationale for this change
We originally used the `concurrency` config setting to determine how many
threads
ritchie46 commented on issue #461:
URL: https://github.com/apache/arrow-rs/issues/461#issuecomment-877811760
I think a lean arrow-core crate would be beneficial
We already started some work be feature gating IO logic.
Wouldn't the same be achieved by `impl FromIterator for
jorgecarleitao commented on issue #461:
URL: https://github.com/apache/arrow-rs/issues/461#issuecomment-877808997
Let me try to explain my reasoning atm.
All methods exposed on `Array` are `O(1)`. In particular, `.slice` is `O(1)`
over the array, and thus `O(c)` over the record
andygrove commented on pull request #683:
URL: https://github.com/apache/arrow-datafusion/pull/683#issuecomment-877808503
I spent some more time reviewing the codebase this morning and it appears
that we no longer use the `concurrency` config to determine concurrency as in
number of
nevi-me commented on issue #334:
URL: https://github.com/apache/arrow-rs/issues/334#issuecomment-877794567
Can be closed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
nevi-me closed issue #334:
URL: https://github.com/apache/arrow-rs/issues/334
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
nevi-me commented on issue #518:
URL: https://github.com/apache/arrow-rs/issues/518#issuecomment-877794108
@mcassels I've opened this issue to track the bug that you identified
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
codecov-commenter edited a comment on pull request #491:
URL: https://github.com/apache/arrow-rs/pull/491#issuecomment-873369694
#
nevi-me commented on pull request #491:
URL: https://github.com/apache/arrow-rs/pull/491#issuecomment-877791486
@alamb @jorgecarleitao this is now ready for review
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
nevi-me opened a new issue #538:
URL: https://github.com/apache/arrow-rs/issues/538
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
We are adding support for the map data type, which is a list of key-values
ala hashmap.
codecov-commenter edited a comment on pull request #491:
URL: https://github.com/apache/arrow-rs/pull/491#issuecomment-873369694
#
alamb commented on a change in pull request #532:
URL: https://github.com/apache/arrow-rs/pull/532#discussion_r667468941
##
File path: .github/workflows/rust.yml
##
@@ -357,15 +353,22 @@ jobs:
uses: actions/cache@v2
with:
path: /github/home/target
alamb commented on a change in pull request #532:
URL: https://github.com/apache/arrow-rs/pull/532#discussion_r667468650
##
File path: arrow/test/dependency/no-default-features/Cargo.toml
##
@@ -0,0 +1,32 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or
alamb commented on pull request #537:
URL: https://github.com/apache/arrow-rs/pull/537#issuecomment-877785281
Although @nevi-me seems to like `RecordBatch::concat` --
https://github.com/apache/arrow-rs/issues/461#issuecomment-877784891
--
This is an automated message from the Apache Git
alamb commented on issue #461:
URL: https://github.com/apache/arrow-rs/issues/461#issuecomment-877785205
> I think this would aid discoverability, in the sense that a person asking
"what can I do with a record batch?" would look at impl RecordBatch and
discover that easily.
I
nevi-me commented on issue #461:
URL: https://github.com/apache/arrow-rs/issues/461#issuecomment-877784891
I'd lean towards `RecordBatch::filter` and `RecordBatch::concat` as long as
that ends up being a convenience that iterates through all columns, and calls
the underlying compute or
alamb commented on a change in pull request #537:
URL: https://github.com/apache/arrow-rs/pull/537#discussion_r667467518
##
File path: arrow/src/record_batch.rs
##
@@ -639,4 +669,108 @@ mod tests {
assert_eq!(batch.column(0).as_ref(), boolean.as_ref());
alamb commented on issue #461:
URL: https://github.com/apache/arrow-rs/issues/461#issuecomment-877783516
> Wouldn't it make it more sense to add this to compute as a normal
function? Implementing it on the RecordBatch interface is breaking the
separation between compute and data that we
nevi-me commented on a change in pull request #389:
URL: https://github.com/apache/arrow-rs/pull/389#discussion_r667463293
##
File path: arrow/src/array/transform/structure.rs
##
@@ -26,13 +26,10 @@ pub(super) fn build_extend(array: ) -> Extend {
index:
nevi-me commented on pull request #389:
URL: https://github.com/apache/arrow-rs/pull/389#issuecomment-877780039
@jorgecarleitao @jhorstmann @bjchambers this is ready for review, and
partially fixes #514
--
This is an automated message from the Apache Git Service.
To respond to the
codecov-commenter commented on pull request #389:
URL: https://github.com/apache/arrow-rs/pull/389#issuecomment-88131
#
nevi-me commented on a change in pull request #389:
URL: https://github.com/apache/arrow-rs/pull/389#discussion_r667459042
##
File path: arrow/src/array/array_struct.rs
##
@@ -85,12 +85,7 @@ impl From for StructArray {
fn from(data: ArrayData) -> Self {
let mut
domoritz edited a comment on pull request #10673:
URL: https://github.com/apache/arrow/pull/10673#issuecomment-877747590
~~Hmm, can it be that test times for the docker build went from 20 minutes
to 50 minutes? (https://github.com/apache/arrow/runs/3005016709 vs
codecov-commenter edited a comment on pull request #491:
URL: https://github.com/apache/arrow-rs/pull/491#issuecomment-873369694
#
domoritz commented on pull request #10673:
URL: https://github.com/apache/arrow/pull/10673#issuecomment-877747590
Hmm, can it be that test times for the docker build went from 20 minutes to
50 minutes? (https://github.com/apache/arrow/runs/3005016709 vs
71 matches
Mail list logo