[GitHub] [arrow] kiszk edited a comment on pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-07-31 Thread GitBox
kiszk edited a comment on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-667471857 It is ready for review now. @wesm @pitrou @kou Would it be possible to review it if you have time? This is

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-07-31 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-667471857 It is ready for review now. @wesm @pitrou @kou Would it be possible to review it? This is an automated message fr

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7687: ARROW-9382: [Rust][DataFusion] Simplified hash aggregations and added Boolean type

2020-07-31 Thread GitBox
jorgecarleitao commented on a change in pull request #7687: URL: https://github.com/apache/arrow/pull/7687#discussion_r463919728 ## File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs ## @@ -327,120 +278,47 @@ impl RecordBatchReader for GroupedHashAggregate

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463917731 ## File path: cpp/src/arrow/python/common.h ## @@ -137,6 +137,11 @@ class ARROW_PYTHON_EXPORT OwnedRef { OwnedRef(OwnedRef&& other) : OwnedRef(other.

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7687: ARROW-9382: [Rust][DataFusion] Simplified hash aggregations and added Boolean type

2020-07-31 Thread GitBox
jorgecarleitao commented on a change in pull request #7687: URL: https://github.com/apache/arrow/pull/7687#discussion_r463917429 ## File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs ## @@ -327,120 +278,47 @@ impl RecordBatchReader for GroupedHashAggregate

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916946 ## File path: python/pyarrow/tests/test_types.py ## @@ -251,6 +254,121 @@ def test_is_primitive(): assert not types.is_primitive(pa.list_(pa.int32(

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916817 ## File path: python/pyarrow/tests/test_pandas.py ## @@ -3325,13 +3325,35 @@ def test_cast_timestamp_unit(): assert result.equals(expected) -de

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916695 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -240,19 +242,20 @@ struct ValueConverter { static inline Result FromPython(PyObject* obj,

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916508 ## File path: cpp/src/arrow/python/datetime.h ## @@ -157,6 +157,22 @@ inline int64_t PyDelta_to_ns(PyDateTime_Delta* pytimedelta) { return PyDelta_t

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463915789 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -642,24 +641,27 @@ inline Status ConvertStruct(const PandasOptions& options, const ChunkedA

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463915701 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -861,10 +861,10 @@ void AddBinaryLength(FunctionRegistry* registry) { applicat

[GitHub] [arrow] emkornfield commented on a change in pull request #7862: ARROW-9598: [C++][Parquet] Fix writing nullable structs

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7862: URL: https://github.com/apache/arrow/pull/7862#discussion_r463915579 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -2344,6 +2344,23 @@ TEST(ArrowReadWrite, SimpleStructRoundTrip) { 2); }

[GitHub] [arrow] emkornfield commented on a change in pull request #7817: ARROW-9377: [Java] Support unsigned dictionary indices

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7817: URL: https://github.com/apache/arrow/pull/7817#discussion_r463915338 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -2977,4 +2977,47 @@ public void testEmptyBufBehavior() {

[GitHub] [arrow] emkornfield commented on a change in pull request #7817: ARROW-9377: [Java] Support unsigned dictionary indices

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7817: URL: https://github.com/apache/arrow/pull/7817#discussion_r463914759 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestDictionaryVector.java ## @@ -878,6 +880,103 @@ public void testEncodeStructSubField

[GitHub] [arrow] emkornfield commented on pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on pull request #7837: URL: https://github.com/apache/arrow/pull/7837#issuecomment-667459724 @liyafan82 one small comment on a typo/better error message otherwise this looks good to me. This is an autom

[GitHub] [arrow] emkornfield commented on a change in pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7837: URL: https://github.com/apache/arrow/pull/7837#discussion_r463913968 ## File path: java/algorithm/src/test/java/org/apache/arrow/algorithm/sort/TestSortingUtil.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Sof

[GitHub] [arrow] emkornfield commented on a change in pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7837: URL: https://github.com/apache/arrow/pull/7837#discussion_r463913775 ## File path: java/algorithm/src/main/java/org/apache/arrow/algorithm/sort/FixedWidthOutOfPlaceVectorSorter.java ## @@ -44,6 +45,13 @@ public void sort

[GitHub] [arrow] emkornfield commented on a change in pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7837: URL: https://github.com/apache/arrow/pull/7837#discussion_r463913435 ## File path: java/algorithm/src/test/java/org/apache/arrow/algorithm/sort/TestSortingUtil.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Sof

[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-07-31 Thread GitBox
kiszk commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r463912700 ## File path: ci/scripts/integration_arrow.sh ## @@ -24,9 +24,16 @@ source_dir=${1}/cpp build_dir=${2}/cpp gold_dir_0_14_1=$arrow_dir/testing/data/arrow-ipc

[GitHub] [arrow] kiszk commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-31 Thread GitBox
kiszk commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r463911559 ## File path: cpp/src/arrow/util/compression_lz4.cc ## @@ -349,6 +350,90 @@ class Lz4Codec : public Codec { const char* name() const override { return "lz4

[GitHub] [arrow] emkornfield commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-07-31 Thread GitBox
emkornfield commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-667456601 @tianchen92 would you mind starting a thread on the ML, it seems that @jacques-n might not have bandwidth. Th

[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

2020-07-31 Thread GitBox
mr-smidge commented on a change in pull request #7654: URL: https://github.com/apache/arrow/pull/7654#discussion_r463888181 ## File path: csharp/test/Apache.Arrow.Tests/TestDateAndTimeData.cs ## @@ -0,0 +1,83 @@ +// Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

2020-07-31 Thread GitBox
mr-smidge commented on a change in pull request #7654: URL: https://github.com/apache/arrow/pull/7654#discussion_r463886892 ## File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs ## @@ -15,56 +15,103 @@ using Apache.Arrow.Types; using System; -using System.Collections.

[GitHub] [arrow] kou commented on issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
kou commented on issue #7864: URL: https://github.com/apache/arrow/issues/7864#issuecomment-667403163 For socket path: https://arrow.apache.org/docs/developers/contributing.html#report-bugs-and-propose-features For stopping `plasma-store-server`, ```python with plasma.start

[GitHub] [arrow] andygrove commented on a change in pull request #7687: ARROW-9382: [Rust][DataFusion] Simplified hash aggregations and added Boolean type

2020-07-31 Thread GitBox
andygrove commented on a change in pull request #7687: URL: https://github.com/apache/arrow/pull/7687#discussion_r463854007 ## File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs ## @@ -327,120 +278,47 @@ impl RecordBatchReader for GroupedHashAggregateItera

[GitHub] [arrow] wesm commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-31 Thread GitBox
wesm commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r463842568 ## File path: cpp/src/arrow/util/compression.cc ## @@ -131,7 +131,7 @@ Result> Codec::Create(Compression::type codec_type, if (compression_level_set) {

[GitHub] [arrow] nealrichardson commented on pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
nealrichardson commented on pull request #7875: URL: https://github.com/apache/arrow/pull/7875#issuecomment-667354516 > What do you think about making this part of the master R project (versus as a separate CRAN package)? We could. When I started trying this, I don't think I had a gr

[GitHub] [arrow] wesm commented on pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
wesm commented on pull request #7875: URL: https://github.com/apache/arrow/pull/7875#issuecomment-667352389 Cool! What do you think about making this part of the master R project (versus as a separate CRAN package)? I can look at the lower-level details later -

[GitHub] [arrow] wesm closed issue #7835: ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)

2020-07-31 Thread GitBox
wesm closed issue #7835: URL: https://github.com/apache/arrow/issues/7835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] wesm commented on issue #7835: ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)

2020-07-31 Thread GitBox
wesm commented on issue #7835: URL: https://github.com/apache/arrow/issues/7835#issuecomment-667351962 I opened https://issues.apache.org/jira/browse/ARROW-9612 This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] wesm closed issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
wesm closed issue #7864: URL: https://github.com/apache/arrow/issues/7864 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] wesm commented on issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
wesm commented on issue #7864: URL: https://github.com/apache/arrow/issues/7864#issuecomment-667351153 Doesn't appear so. If you want to suggest a feature I recommend that you open a JIRA issue This is an automated message f

[GitHub] [arrow] github-actions[bot] commented on pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7875: URL: https://github.com/apache/arrow/pull/7875#issuecomment-667349459 https://issues.apache.org/jira/browse/ARROW-3757 This is an automated message from the Apache Git Serv

[GitHub] [arrow] nealrichardson opened a new pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
nealrichardson opened a new pull request #7875: URL: https://github.com/apache/arrow/pull/7875 This is a proof-of-concept R package that uses pyarrow/reticulate to provide a Flight client in order to avoid dealing with Flight in the R build setup. See the included README.md for details and

[GitHub] [arrow] yordan-pavlov commented on pull request #7798: ARROW-9523 [Rust] Improve filter kernel performance

2020-07-31 Thread GitBox
yordan-pavlov commented on pull request #7798: URL: https://github.com/apache/arrow/pull/7798#issuecomment-667344474 @paddyhoran yes, you are right, I added a couple more tests for sliced arrays and they didn't pass so seeing that the PR was not yet merged I added a few small changes to

[GitHub] [arrow] github-actions[bot] commented on pull request #7873: ARROW-9608: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667261686 https://issues.apache.org/jira/browse/ARROW-9608 This is an automated message from the Apache Git Serv

[GitHub] [arrow] sunchao commented on a change in pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
sunchao commented on a change in pull request #7873: URL: https://github.com/apache/arrow/pull/7873#discussion_r463729528 ## File path: rust/parquet/Cargo.toml ## @@ -49,7 +49,6 @@ brotli = "3.3" flate2 = "1.0" lz4 = "1.23" zstd = "0.5" -arrow = { path = "../arrow", version

[GitHub] [arrow] sunchao commented on pull request #7874: ARROW-9582: [Rust] Implement memory size methods

2020-07-31 Thread GitBox
sunchao commented on pull request #7874: URL: https://github.com/apache/arrow/pull/7874#issuecomment-667226096 Instead of `buffer_memory_size` and `total_memory_size`, I'm thinking whether `memory_used` and `memory_capacity` makes more sense. -

[GitHub] [arrow] github-actions[bot] commented on pull request #7874: ARROW-9582: [Rust] Implement memory size methods

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7874: URL: https://github.com/apache/arrow/pull/7874#issuecomment-667208832 https://issues.apache.org/jira/browse/ARROW-9582 This is an automated message from the Apache Git Serv

[GitHub] [arrow] andygrove closed pull request #7853: ARROW-9582: [Rust] Add memory_size() method to Array [DRAFT]

2020-07-31 Thread GitBox
andygrove closed pull request #7853: URL: https://github.com/apache/arrow/pull/7853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] andygrove commented on pull request #7853: ARROW-9582: [Rust] Add memory_size() method to Array [DRAFT]

2020-07-31 Thread GitBox
andygrove commented on pull request #7853: URL: https://github.com/apache/arrow/pull/7853#issuecomment-667207713 Replaced by https://github.com/apache/arrow/pull/7874 This is an automated message from the Apache Git Service.

[GitHub] [arrow] vertexclique opened a new pull request #7874: ARROW-9582: [Rust] Implement memory size methods

2020-07-31 Thread GitBox
vertexclique opened a new pull request #7874: URL: https://github.com/apache/arrow/pull/7874 This PR is a slightly extended version of the PR https://github.com/apache/arrow/pull/7853. * `buffer_memory_size`: Only calculates internally held data size. * `total_memory_size`: Calcul

[GitHub] [arrow] lidavidm commented on pull request #7863: ARROW-9344: [C++][Flight] Measure latency quantiles

2020-07-31 Thread GitBox
lidavidm commented on pull request #7863: URL: https://github.com/apache/arrow/pull/7863#issuecomment-667190655 Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] lidavidm closed pull request #7863: ARROW-9344: [C++][Flight] Measure latency quantiles

2020-07-31 Thread GitBox
lidavidm closed pull request #7863: URL: https://github.com/apache/arrow/pull/7863 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] andygrove commented on pull request #7853: ARROW-9582: [Rust] Add memory_size() method to Array [DRAFT]

2020-07-31 Thread GitBox
andygrove commented on pull request #7853: URL: https://github.com/apache/arrow/pull/7853#issuecomment-667122842 @vertexclique Is working on a PR for this as well so I will likely close this one the new PR is up. This is an

[GitHub] [arrow] pereverges commented on issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
pereverges commented on issue #7864: URL: https://github.com/apache/arrow/issues/7864#issuecomment-667103343 Is there a way to choose the socket when using the plasma.start_plasma_store(): in the link https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.py#L82 does not seem l

[GitHub] [arrow] svenwb commented on pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
svenwb commented on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667080040 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] svenwb removed a comment on pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
svenwb removed a comment on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667080040 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] github-actions[bot] commented on pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667061089 https://issues.apache.org/jira/browse/ARROW-9609 This is an automated message from the Apache Git Serv

[GitHub] [arrow] vertexclique opened a new pull request #7873: ARROW-9609 - Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
vertexclique opened a new pull request #7873: URL: https://github.com/apache/arrow/pull/7873 Currently, the parquet is installing arrow-flight and it's dependencies, which breaks the CI builds and it's unnecessary because it is not used. Parquet should work without any default features by

[GitHub] [arrow] github-actions[bot] commented on pull request #7872: ARROW-9607: [C++][Gandiva] Add bitwise_and(), bitwise_or() and bitwise_not() functions for integers

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7872: URL: https://github.com/apache/arrow/pull/7872#issuecomment-667026323 https://issues.apache.org/jira/browse/ARROW-9607 This is an automated message from the Apache Git Serv

[GitHub] [arrow] sagnikc-dremio opened a new pull request #7872: ARROW-9607: [C++][Gandiva] Add bitwise_and(), bitwise_or() and bitwise_not() functions for integers

2020-07-31 Thread GitBox
sagnikc-dremio opened a new pull request #7872: URL: https://github.com/apache/arrow/pull/7872 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-07-31 Thread GitBox
jianxind commented on pull request #7871: URL: https://github.com/apache/arrow/pull/7871#issuecomment-666978438 I can trigger a benchmark action once https://github.com/apache/arrow/pull/7870 get merged. Below is the BM data for int types on my setup: ``` Before: MinMaxKerne

[GitHub] [arrow] github-actions[bot] commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7871: URL: https://github.com/apache/arrow/pull/7871#issuecomment-666974009 https://issues.apache.org/jira/browse/ARROW-9605 This is an automated message from the Apache Git Serv

[GitHub] [arrow] jianxind opened a new pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-07-31 Thread GitBox
jianxind opened a new pull request #7871: URL: https://github.com/apache/arrow/pull/7871 1. Use BitBlockCounter to speedup the performance for typical 0.01% null-able data. 2. Enable AVX compiler auto vectorize version for no-nulls on int types. Float/Double use fmin/fmax to handle NaN

[GitHub] [arrow] github-actions[bot] commented on pull request #7870: ARROW-9604: [C++] Add aggregate min/max benchmark

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7870: URL: https://github.com/apache/arrow/pull/7870#issuecomment-666967616 https://issues.apache.org/jira/browse/ARROW-9604 This is an automated message from the Apache Git Serv

[GitHub] [arrow] jianxind opened a new pull request #7870: ARROW-9604: [C++] Add aggregate min/max benchmark

2020-07-31 Thread GitBox
jianxind opened a new pull request #7870: URL: https://github.com/apache/arrow/pull/7870 Add benchmark for aggregate min/max compute kernels Signed-off-by: Frank Du This is an automated message from the Apache Git Ser