[GitHub] [arrow-datafusion] houqp commented on a change in pull request #443: add invariants spec

2021-06-01 Thread GitBox
houqp commented on a change in pull request #443: URL: https://github.com/apache/arrow-datafusion/pull/443#discussion_r643661190 ## File path: docs/specification/invariants.md ## @@ -0,0 +1,327 @@ + + +# DataFusion's Invariants + +This document enumerates invariants of

[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #443: add invariants spec

2021-06-01 Thread GitBox
jorgecarleitao commented on a change in pull request #443: URL: https://github.com/apache/arrow-datafusion/pull/443#discussion_r643655420 ## File path: docs/specification/invariants.md ## @@ -0,0 +1,327 @@ + + +# DataFusion's Invariants + +This document enumerates invariants

[GitHub] [arrow-datafusion] houqp commented on a change in pull request #443: add invariants spec

2021-06-01 Thread GitBox
houqp commented on a change in pull request #443: URL: https://github.com/apache/arrow-datafusion/pull/443#discussion_r643649837 ## File path: docs/specification/invariants.md ## @@ -0,0 +1,327 @@ + + +# DataFusion's Invariants + +This document enumerates invariants of

[GitHub] [arrow-datafusion] houqp commented on a change in pull request #443: add invariants spec

2021-06-01 Thread GitBox
houqp commented on a change in pull request #443: URL: https://github.com/apache/arrow-datafusion/pull/443#discussion_r643649837 ## File path: docs/specification/invariants.md ## @@ -0,0 +1,327 @@ + + +# DataFusion's Invariants + +This document enumerates invariants of

[GitHub] [arrow-datafusion] houqp commented on a change in pull request #443: add invariants spec

2021-06-01 Thread GitBox
houqp commented on a change in pull request #443: URL: https://github.com/apache/arrow-datafusion/pull/443#discussion_r643649837 ## File path: docs/specification/invariants.md ## @@ -0,0 +1,327 @@ + + +# DataFusion's Invariants + +This document enumerates invariants of

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #463: Add sort in window functions

2021-06-01 Thread GitBox
codecov-commenter commented on pull request #463: URL: https://github.com/apache/arrow-datafusion/pull/463#issuecomment-852716291 #

[GitHub] [arrow] cyb70289 edited a comment on pull request #10364: ARROW-12074: [C++][Compute] Add scalar arithmetic kernels for decimal

2021-06-01 Thread GitBox
cyb70289 edited a comment on pull request #10364: URL: https://github.com/apache/arrow/pull/10364#issuecomment-852709410 @bkietz , met with one problem, would like to hear your comments. Thanks. Decimal upscaling is operation dependent. E.g., `+,-` will upscale arg with smaller

[GitHub] [arrow-datafusion] msathis commented on issue #472: [Ballista] Improve task and job metadata

2021-06-01 Thread GitBox
msathis commented on issue #472: URL: https://github.com/apache/arrow-datafusion/issues/472#issuecomment-852710260 This will be great addition. We can expose this information to the UI as well. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] cyb70289 commented on pull request #10364: ARROW-12074: [C++][Compute] Add scalar arithmetic kernels for decimal

2021-06-01 Thread GitBox
cyb70289 commented on pull request #10364: URL: https://github.com/apache/arrow/pull/10364#issuecomment-852709410 @bkietz , met with one problem, would like to hear your comments. Thanks. Decimal upscaling is operation dependent. E.g., `+,-` will upscale arg with small scale to

[GitHub] [arrow] westonpace commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643637975 ## File path: cpp/src/arrow/compute/exec/expression_test.cc ## @@ -165,6 +165,56 @@ TEST(ExpressionUtils, StripOrderPreservingCasts) {

[GitHub] [arrow] westonpace commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643637072 ## File path: cpp/src/arrow/dataset/dataset_internal.h ## @@ -204,5 +204,35 @@ arrow::Result> GetFragmentScanOptions( return

[GitHub] [arrow] westonpace commented on pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on pull request #10258: URL: https://github.com/apache/arrow/pull/10258#issuecomment-852702836 @pitrou Don't worry about the delay, I've been plenty busy elsewhere. I have a just a few follow-up questions and then I'll make the changes. -- This is an automated

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643636361 ## File path: cpp/src/arrow/util/future_test.cc ## @@ -952,6 +951,85 @@ TEST(FutureCompletionTest, FutureVoid) { } } +class FutureSchedulingTest

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643636029 ## File path: cpp/src/arrow/util/test_common.h ## @@ -85,4 +88,18 @@ inline void AssertIteratorExhausted(Iterator& it) { Transformer

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643634671 ## File path: cpp/src/arrow/util/future.h ## @@ -453,30 +480,35 @@ class Future { /// cyclic reference to itself through the callback. template

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643634671 ## File path: cpp/src/arrow/util/future.h ## @@ -453,30 +480,35 @@ class Future { /// cyclic reference to itself through the callback. template

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #443: add invariants spec

2021-06-01 Thread GitBox
codecov-commenter edited a comment on pull request #443: URL: https://github.com/apache/arrow-datafusion/pull/443#issuecomment-850944261 #

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643632996 ## File path: cpp/src/arrow/util/future.cc ## @@ -272,8 +315,8 @@ class ConcreteFutureImpl : public FutureImpl { // // In fact, it is

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643632152 ## File path: cpp/src/arrow/util/future.h ## @@ -202,8 +202,30 @@ enum class FutureState : int8_t { PENDING, SUCCESS, FAILURE }; inline bool

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643630671 ## File path: cpp/src/arrow/util/future.cc ## @@ -272,8 +315,8 @@ class ConcreteFutureImpl : public FutureImpl { // // In fact, it is

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643628821 ## File path: cpp/src/arrow/util/future.cc ## @@ -231,26 +232,68 @@ class ConcreteFutureImpl : public FutureImpl { void DoMarkFailed() {

[GitHub] [arrow] westonpace commented on a change in pull request #10258: ARROW-12560: [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10258: URL: https://github.com/apache/arrow/pull/10258#discussion_r643625949 ## File path: cpp/src/arrow/util/future.cc ## @@ -231,26 +232,68 @@ class ConcreteFutureImpl : public FutureImpl { void DoMarkFailed() {

[GitHub] [arrow] westonpace commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10421: URL: https://github.com/apache/arrow/pull/10421#discussion_r643623285 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -288,6 +288,10 @@ class ARROW_EXPORT ThreadPool : public Executor { // tasks are finished.

[GitHub] [arrow] westonpace commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10421: URL: https://github.com/apache/arrow/pull/10421#discussion_r643622911 ## File path: cpp/src/arrow/util/thread_pool_benchmark.cc ## @@ -103,6 +103,52 @@ static void ThreadPoolSpawn(benchmark::State& state) { // NOLINT

[GitHub] [arrow] westonpace edited a comment on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

2021-06-01 Thread GitBox
westonpace edited a comment on pull request #10421: URL: https://github.com/apache/arrow/pull/10421#issuecomment-852682551 Just adding the benchmark... ``` ThreadPoolSpawn/threads:1/task_cost:1000/real_time 104576026 ns 39527736 ns7

[GitHub] [arrow] westonpace commented on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

2021-06-01 Thread GitBox
westonpace commented on pull request #10421: URL: https://github.com/apache/arrow/pull/10421#issuecomment-852682551 Just adding the benchmark... ``` ThreadPoolSpawn/threads:1/task_cost:1000/real_time 104576026 ns 39527736 ns7 items_per_second=1.91249M/s

[GitHub] [arrow] liyafan82 commented on a change in pull request #10423: ARROW-12907: [Java] Fix memory leak on deserialization errors

2021-06-01 Thread GitBox
liyafan82 commented on a change in pull request #10423: URL: https://github.com/apache/arrow/pull/10423#discussion_r643606492 ## File path: java/vector/src/test/java/org/apache/arrow/vector/ipc/MessageSerializerTest.java ## @@ -197,12 +199,30 @@ public void

[GitHub] [arrow] liyafan82 commented on a change in pull request #10423: ARROW-12907: [Java] Fix memory leak on deserialization errors

2021-06-01 Thread GitBox
liyafan82 commented on a change in pull request #10423: URL: https://github.com/apache/arrow/pull/10423#discussion_r643604065 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -723,8 +723,13 @@ public static

[GitHub] [arrow-rs] nevi-me commented on a change in pull request #384: Implement faster arrow array reader

2021-06-01 Thread GitBox
nevi-me commented on a change in pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#discussion_r643577679 ## File path: parquet/src/arrow/arrow_array_reader.rs ## @@ -0,0 +1,1394 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] github-actions[bot] commented on pull request #10433: ARROW-12911: [Python] Export scalar aggregate options to pc.sum

2021-06-01 Thread GitBox
github-actions[bot] commented on pull request #10433: URL: https://github.com/apache/arrow/pull/10433#issuecomment-852644263 https://issues.apache.org/jira/browse/ARROW-12911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] cyb70289 opened a new pull request #10433: ARROW-12911: [Python] Export scalar aggregate options to pc.sum

2021-06-01 Thread GitBox
cyb70289 opened a new pull request #10433: URL: https://github.com/apache/arrow/pull/10433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643590723 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643590630 ## File path: docs/source/cpp/compute.rst ## @@ -637,6 +637,54 @@ String extraction e.g. 'letter' and 'digit' for the regular expression

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643590434 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -0,0 +1,107 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643590355 ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -462,5 +462,177 @@ ARROW_EXPORT Result FillNull(const Datum& values, const Datum& fill_value,

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643590073 ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -462,5 +462,177 @@ ARROW_EXPORT Result FillNull(const Datum& values, const Datum& fill_value,

[GitHub] [arrow-rs] nevi-me commented on a change in pull request #386: add more tests for window::shift and handle boundary cases

2021-06-01 Thread GitBox
nevi-me commented on a change in pull request #386: URL: https://github.com/apache/arrow-rs/pull/386#discussion_r643581616 ## File path: arrow/src/compute/kernels/window.rs ## @@ -33,56 +32,120 @@ use crate::{array::PrimitiveArray, datatypes::ArrowPrimitiveType,

[GitHub] [arrow-rs] nevi-me commented on a change in pull request #388: window::shift to work for all array types

2021-06-01 Thread GitBox
nevi-me commented on a change in pull request #388: URL: https://github.com/apache/arrow-rs/pull/388#discussion_r643581161 ## File path: arrow/src/compute/kernels/window.rs ## @@ -33,56 +32,161 @@ use crate::{array::PrimitiveArray, datatypes::ArrowPrimitiveType,

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643578583 ## File path: cpp/src/arrow/compute/exec/expression.cc ## @@ -613,6 +639,22 @@ std::vector FieldsInExpression(const Expression& expr) { return fields;

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643578392 ## File path: cpp/src/arrow/compute/exec/expression.cc ## @@ -61,13 +61,22 @@ Expression call(std::string function, std::vector arguments,

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643577382 ## File path: cpp/src/arrow/compute/exec/expression.cc ## @@ -510,7 +475,67 @@ Result Expression::Bind(const Schema& in_schema, return

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643576195 ## File path: cpp/src/arrow/compute/exec/expression_test.cc ## @@ -165,6 +165,56 @@ TEST(ExpressionUtils, StripOrderPreservingCasts) {

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643575745 ## File path: cpp/src/arrow/dataset/dataset_internal.h ## @@ -204,5 +204,35 @@ arrow::Result> GetFragmentScanOptions( return

[GitHub] [arrow-rs] nevi-me commented on pull request #389: make slice work for nested types

2021-06-01 Thread GitBox
nevi-me commented on pull request #389: URL: https://github.com/apache/arrow-rs/pull/389#issuecomment-852610400 @jorgecarleitao I have the below failures, which are mostly related to `MutableArrayData`. I need your help when you have some time to spare. I worked on this over the

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #381: Respect max rowgroup size in Arrow writer

2021-06-01 Thread GitBox
codecov-commenter edited a comment on pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#issuecomment-850950657 #

[GitHub] [arrow-rs] nevi-me opened a new pull request #389: make slice work for nested types

2021-06-01 Thread GitBox
nevi-me opened a new pull request #389: URL: https://github.com/apache/arrow-rs/pull/389 # Which issue does this PR close? Corresponding issue might not yet exist, will check when finalising this PR # Rationale for this change `ArrayData::slice()` does not work for

[GitHub] [arrow-rs] nevi-me commented on pull request #381: Respect max rowgroup size in Arrow writer

2021-06-01 Thread GitBox
nevi-me commented on pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#issuecomment-852597694 I've addressed feedback, PTAL @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow-rs] nevi-me commented on a change in pull request #381: Respect max rowgroup size in Arrow writer

2021-06-01 Thread GitBox
nevi-me commented on a change in pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#discussion_r643570618 ## File path: parquet/src/arrow/arrow_writer.rs ## @@ -1176,31 +1236,51 @@ mod tests { let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();

[GitHub] [arrow] jonkeane commented on a change in pull request #10430: ARROW-12915: [Release] Build of ubuntu-docs is failing on thrift

2021-06-01 Thread GitBox
jonkeane commented on a change in pull request #10430: URL: https://github.com/apache/arrow/pull/10430#discussion_r643569671 ## File path: ci/docker/ubuntu-20.04-cpp.dockerfile ## @@ -75,6 +75,7 @@ RUN apt-get update -y -q && \ libcurl4-openssl-dev \

[GitHub] [arrow-datafusion] BohuTANG edited a comment on issue #354: Implement some way to self assign tickets without having full edit access to github

2021-06-01 Thread GitBox
BohuTANG edited a comment on issue #354: URL: https://github.com/apache/arrow-datafusion/issues/354#issuecomment-852561940 Hello, we can consider this bot https://github.com/datafuselabs/fusebots , who has an `/assignme` command to take the current issue away and then

[GitHub] [arrow] kou commented on a change in pull request #10430: ARROW-12915: [Release] Build of ubuntu-docs is failing on thrift

2021-06-01 Thread GitBox
kou commented on a change in pull request #10430: URL: https://github.com/apache/arrow/pull/10430#discussion_r643565782 ## File path: ci/docker/ubuntu-20.04-cpp.dockerfile ## @@ -75,6 +75,7 @@ RUN apt-get update -y -q && \ libcurl4-openssl-dev \ libgflags-dev

[GitHub] [arrow-datafusion] BohuTANG edited a comment on issue #354: Implement some way to self assign tickets without having full edit access to github

2021-06-01 Thread GitBox
BohuTANG edited a comment on issue #354: URL: https://github.com/apache/arrow-datafusion/issues/354#issuecomment-852561940 Hello, we can consider this bot https://github.com/datafuselabs/fusebots , who has an '/assignme' command to take the current issue away and then automatically add a

[GitHub] [arrow-datafusion] BohuTANG commented on issue #354: Implement some way to self assign tickets without having full edit access to github

2021-06-01 Thread GitBox
BohuTANG commented on issue #354: URL: https://github.com/apache/arrow-datafusion/issues/354#issuecomment-852561940 Hello, we can consider this bot https://github.com/datafuselabs/fusebots , who has an '/assignme' command to take the current issue away and then automatically add an

[GitHub] [arrow] lidavidm commented on a change in pull request #9620: ARROW-11843: [C++] Provide async Parquet reader

2021-06-01 Thread GitBox
lidavidm commented on a change in pull request #9620: URL: https://github.com/apache/arrow/pull/9620#discussion_r643540804 ## File path: cpp/src/parquet/file_reader.cc ## @@ -264,23 +264,92 @@ class SerializedFile : public ParquetFileReader::Contents { } }

[GitHub] [arrow-datafusion] alamb commented on pull request #429: implement lead and lag built-in window function

2021-06-01 Thread GitBox
alamb commented on pull request #429: URL: https://github.com/apache/arrow-datafusion/pull/429#issuecomment-852533565 I plan to review this PR tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow-datafusion] alamb commented on pull request #470: Support semi join

2021-06-01 Thread GitBox
alamb commented on pull request #470: URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-852533237 I plan to review this tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-rs] alamb commented on pull request #384: Implement faster arrow array reader

2021-06-01 Thread GitBox
alamb commented on pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#issuecomment-852532332 Thanks @yordan-pavlov -- I will try and set time aside tomorrow to review this PR. Sorry for the delay -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] nealrichardson closed pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

2021-06-01 Thread GitBox
nealrichardson closed pull request #10381: URL: https://github.com/apache/arrow/pull/10381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [arrow] github-actions[bot] commented on pull request #10432: ARROW-12924: [Gandiva][C++] Implement CONVERT_TIMEZONE SQL function in Gandiva

2021-06-01 Thread GitBox
github-actions[bot] commented on pull request #10432: URL: https://github.com/apache/arrow/pull/10432#issuecomment-852494029 https://issues.apache.org/jira/browse/ARROW-12924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] jvictorhuguenin opened a new pull request #10432: ARROW-12924: [Gandiva][C++] Implement CONVERT_TIMEZONE SQL function in Gandiva

2021-06-01 Thread GitBox
jvictorhuguenin opened a new pull request #10432: URL: https://github.com/apache/arrow/pull/10432 Converts timestamp to specified timezone. If the sourceTimezone parameter is not present, Dremio assumes the timestamp provided in the third parameter is in UTC format. The sourceTimezone and

[GitHub] [arrow] jonkeane commented on pull request #10430: ARROW-12915: [Release] Build of ubuntu-docs is failing on thrift

2021-06-01 Thread GitBox
jonkeane commented on pull request #10430: URL: https://github.com/apache/arrow/pull/10430#issuecomment-852480193 I ran this on crossbow (locally since the GH actions comment bot isn't working right now https://issues.apache.org/jira/browse/ARROW-12919) and the only ubuntu crossbow builds

[GitHub] [arrow] lidavidm commented on pull request #10386: ARROW-12859: [C++] Add ScalarFromJSON for testing

2021-06-01 Thread GitBox
lidavidm commented on pull request #10386: URL: https://github.com/apache/arrow/pull/10386#issuecomment-852463069 For CastTo: I think it was actually the Cast kernel (string->type cast). I added some basic tests. I don't think there's a good way to hit the DCHECK because that would

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #474: Update k8s user guide to use deployments

2021-06-01 Thread GitBox
codecov-commenter commented on pull request #474: URL: https://github.com/apache/arrow-datafusion/pull/474#issuecomment-852435958 #

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
codecov-commenter commented on pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#issuecomment-852434776 #

[GitHub] [arrow-datafusion] edrevo commented on issue #72: Update link in Ballista donation blog post

2021-06-01 Thread GitBox
edrevo commented on issue #72: URL: https://github.com/apache/arrow-datafusion/issues/72#issuecomment-852428471 I'd say this is fixed, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] edrevo commented on pull request #474: Update k8s user guide to use deployments

2021-06-01 Thread GitBox
edrevo commented on pull request #474: URL: https://github.com/apache/arrow-datafusion/pull/474#issuecomment-852426880 cc @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] bkietz commented on a change in pull request #10410: ARROW-10640: [C++] A "where" kernel to combine two arrays based on a mask

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10410: URL: https://github.com/apache/arrow/pull/10410#discussion_r643454193 ## File path: cpp/src/arrow/compute/kernels/scalar_if_else_test.cc ## @@ -0,0 +1,264 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow-datafusion] edrevo opened a new pull request #474: K8s deployments

2021-06-01 Thread GitBox
edrevo opened a new pull request #474: URL: https://github.com/apache/arrow-datafusion/pull/474 # Which issue does this PR close? Closes #473. # What changes are included in this PR? - Rename port to bind-port since the configure_me create was trying to parse an env

[GitHub] [arrow-datafusion] edrevo opened a new issue #473: [Ballista] Use deployments in k8s user guide

2021-06-01 Thread GitBox
edrevo opened a new issue #473: URL: https://github.com/apache/arrow-datafusion/issues/473 The executors can now be used as a k8s deployment, which is a more flexible and simpler k8s primitive. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] rok commented on pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on pull request #10176: URL: https://github.com/apache/arrow/pull/10176#issuecomment-852423550 Thanks for the review @pitrou! I've addressed some of the comments and I'll try to finish the rest today. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow-datafusion] jgoday commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
jgoday commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r643455939 ## File path: datafusion/src/optimizer/mod.rs ## @@ -25,4 +25,5 @@ pub mod hash_build_probe_order; pub mod limit_push_down; pub mod optimizer;

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643455290 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow-datafusion] edrevo commented on issue #472: [Ballista] Improve task and job metadata

2021-06-01 Thread GitBox
edrevo commented on issue #472: URL: https://github.com/apache/arrow-datafusion/issues/472#issuecomment-852422276 cc @pradomota. I'm opening this one in case you want to take a stab at it  We can do pair programming if you want. -- This is an automated message from the Apache Git

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643454925 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow-datafusion] edrevo opened a new issue #472: [Ballista] Improve task and job metadata

2021-06-01 Thread GitBox
edrevo opened a new issue #472: URL: https://github.com/apache/arrow-datafusion/issues/472 The task and job status we save in the scheduler state is currently lacking. See:

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643453195 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643453054 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643452934 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643452234 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow-datafusion] jgoday commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
jgoday commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r643450066 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643449818 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal.cc ## @@ -0,0 +1,614 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643449669 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -0,0 +1,107 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] thisisnic commented on a change in pull request #10326: ARROW-12791: [R] Better error handling for DatasetFactory$Finish() when no format specified

2021-06-01 Thread GitBox
thisisnic commented on a change in pull request #10326: URL: https://github.com/apache/arrow/pull/10326#discussion_r643448875 ## File path: r/R/util.R ## @@ -110,3 +110,15 @@ handle_embedded_nul_error <- function(e) { } stop(e) } + +handle_parquet_io_error <-

[GitHub] [arrow] rok commented on a change in pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-01 Thread GitBox
rok commented on a change in pull request #10176: URL: https://github.com/apache/arrow/pull/10176#discussion_r643448314 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -0,0 +1,107 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643448100 ## File path: cpp/src/arrow/compute/exec/expression.h ## @@ -207,11 +218,22 @@ Result SimplifyWithGuarantee(Expression, // Execution -/// Execute a

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643447263 ## File path: cpp/src/arrow/compute/exec/expression.cc ## @@ -510,7 +475,67 @@ Result Expression::Bind(const Schema& in_schema, return

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643447263 ## File path: cpp/src/arrow/compute/exec/expression.cc ## @@ -510,7 +475,67 @@ Result Expression::Bind(const Schema& in_schema, return

[GitHub] [arrow] bkietz commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
bkietz commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643447166 ## File path: cpp/src/arrow/compute/exec/exec_plan.h ## @@ -225,22 +212,43 @@ class ARROW_EXPORT ExecNode { virtual void StopProducing() = 0;

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r643441081 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r643440084 ## File path: datafusion/src/optimizer/mod.rs ## @@ -25,4 +25,5 @@ pub mod hash_build_probe_order; pub mod limit_push_down; pub mod optimizer;

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r643439834 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r643439659 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses

[GitHub] [arrow] westonpace commented on a change in pull request #10397: ARROW-11930: [C++][Dataset][Compute] Use an ExecPlan for dataset scans

2021-06-01 Thread GitBox
westonpace commented on a change in pull request #10397: URL: https://github.com/apache/arrow/pull/10397#discussion_r643353334 ## File path: cpp/src/arrow/compute/exec/exec_plan.h ## @@ -225,22 +212,43 @@ class ARROW_EXPORT ExecNode { virtual void StopProducing() = 0;

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-06-01 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r643438683 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses

[GitHub] [arrow-datafusion] alamb merged pull request #434: fix: display the content of debug explain

2021-06-01 Thread GitBox
alamb merged pull request #434: URL: https://github.com/apache/arrow-datafusion/pull/434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] alamb closed issue #430: LogicalPlan::inputs function should return the input plan for Explain enum

2021-06-01 Thread GitBox
alamb closed issue #430: URL: https://github.com/apache/arrow-datafusion/issues/430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow] anthonylouisbsb commented on pull request #10350: ARROW-12814: [C++][Gandiva] Implements the ABS, FLOOR, PI, SQRT, SIGN, LSHIFT, RSHIFT, CEIL, TRUNC and LN functions

2021-06-01 Thread GitBox
anthonylouisbsb commented on pull request #10350: URL: https://github.com/apache/arrow/pull/10350#issuecomment-852402117 The `LN` function was added in this PR too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] NGA-TRAN commented on pull request #434: fix: display the content of debug explain

2021-06-01 Thread GitBox
NGA-TRAN commented on pull request #434: URL: https://github.com/apache/arrow-datafusion/pull/434#issuecomment-852397463 @alamb Finally all checks have passed :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] github-actions[bot] commented on pull request #10431: ARROW-12921: [C++][Dataset] Add RadosParquetFileFormat to Dataset API

2021-06-01 Thread GitBox
github-actions[bot] commented on pull request #10431: URL: https://github.com/apache/arrow/pull/10431#issuecomment-852396357 https://issues.apache.org/jira/browse/ARROW-12921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] JayjeetAtGithub opened a new pull request #10431: ARROW-12921: [C++][Dataset] Add RadosParquetFileFormat to Dataset API

2021-06-01 Thread GitBox
JayjeetAtGithub opened a new pull request #10431: URL: https://github.com/apache/arrow/pull/10431 The implementation includes a new `RadosParquetFileFormat` class that inherits from the `ParquetFileFormat` class to defer the evaluation of scan operations on a Parquet dataset to a RADOS

  1   2   3   >