[GitHub] [arrow] kiszk edited a comment on pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-07-31 Thread GitBox
kiszk edited a comment on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-667471857 It is ready for review now. @wesm @pitrou @kou Would it be possible to review it if you have time? This

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-07-31 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-667471857 It is ready for review now. @wesm @pitrou @kou Would it be possible to review it? This is an automated message

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7687: ARROW-9382: [Rust][DataFusion] Simplified hash aggregations and added Boolean type

2020-07-31 Thread GitBox
jorgecarleitao commented on a change in pull request #7687: URL: https://github.com/apache/arrow/pull/7687#discussion_r463919728 ## File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs ## @@ -327,120 +278,47 @@ impl RecordBatchReader for

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463917731 ## File path: cpp/src/arrow/python/common.h ## @@ -137,6 +137,11 @@ class ARROW_PYTHON_EXPORT OwnedRef { OwnedRef(OwnedRef&& other) :

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7687: ARROW-9382: [Rust][DataFusion] Simplified hash aggregations and added Boolean type

2020-07-31 Thread GitBox
jorgecarleitao commented on a change in pull request #7687: URL: https://github.com/apache/arrow/pull/7687#discussion_r463917429 ## File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs ## @@ -327,120 +278,47 @@ impl RecordBatchReader for

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916946 ## File path: python/pyarrow/tests/test_types.py ## @@ -251,6 +254,121 @@ def test_is_primitive(): assert not

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916817 ## File path: python/pyarrow/tests/test_pandas.py ## @@ -3325,13 +3325,35 @@ def test_cast_timestamp_unit(): assert result.equals(expected)

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916695 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -240,19 +242,20 @@ struct ValueConverter { static inline Result FromPython(PyObject*

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463916508 ## File path: cpp/src/arrow/python/datetime.h ## @@ -157,6 +157,22 @@ inline int64_t PyDelta_to_ns(PyDateTime_Delta* pytimedelta) { return

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463915789 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -642,24 +641,27 @@ inline Status ConvertStruct(const PandasOptions& options, const

[GitHub] [arrow] emkornfield commented on a change in pull request #7816: ARROW-9528: [Python] Honor tzinfo when converting from datetime

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7816: URL: https://github.com/apache/arrow/pull/7816#discussion_r463915701 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -861,10 +861,10 @@ void AddBinaryLength(FunctionRegistry* registry) {

[GitHub] [arrow] emkornfield commented on a change in pull request #7862: ARROW-9598: [C++][Parquet] Fix writing nullable structs

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7862: URL: https://github.com/apache/arrow/pull/7862#discussion_r463915579 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -2344,6 +2344,23 @@ TEST(ArrowReadWrite, SimpleStructRoundTrip) { 2); }

[GitHub] [arrow] emkornfield commented on a change in pull request #7817: ARROW-9377: [Java] Support unsigned dictionary indices

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7817: URL: https://github.com/apache/arrow/pull/7817#discussion_r463915338 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -2977,4 +2977,47 @@ public void testEmptyBufBehavior() {

[GitHub] [arrow] emkornfield commented on a change in pull request #7817: ARROW-9377: [Java] Support unsigned dictionary indices

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7817: URL: https://github.com/apache/arrow/pull/7817#discussion_r463914759 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestDictionaryVector.java ## @@ -878,6 +880,103 @@ public void

[GitHub] [arrow] emkornfield commented on pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on pull request #7837: URL: https://github.com/apache/arrow/pull/7837#issuecomment-667459724 @liyafan82 one small comment on a typo/better error message otherwise this looks good to me. This is an

[GitHub] [arrow] emkornfield commented on a change in pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7837: URL: https://github.com/apache/arrow/pull/7837#discussion_r463913968 ## File path: java/algorithm/src/test/java/org/apache/arrow/algorithm/sort/TestSortingUtil.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] emkornfield commented on a change in pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7837: URL: https://github.com/apache/arrow/pull/7837#discussion_r463913775 ## File path: java/algorithm/src/main/java/org/apache/arrow/algorithm/sort/FixedWidthOutOfPlaceVectorSorter.java ## @@ -44,6 +45,13 @@ public void

[GitHub] [arrow] emkornfield commented on a change in pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-07-31 Thread GitBox
emkornfield commented on a change in pull request #7837: URL: https://github.com/apache/arrow/pull/7837#discussion_r463913435 ## File path: java/algorithm/src/test/java/org/apache/arrow/algorithm/sort/TestSortingUtil.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-07-31 Thread GitBox
kiszk commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r463912700 ## File path: ci/scripts/integration_arrow.sh ## @@ -24,9 +24,16 @@ source_dir=${1}/cpp build_dir=${2}/cpp

[GitHub] [arrow] kiszk commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-31 Thread GitBox
kiszk commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r463911559 ## File path: cpp/src/arrow/util/compression_lz4.cc ## @@ -349,6 +350,90 @@ class Lz4Codec : public Codec { const char* name() const override { return

[GitHub] [arrow] emkornfield commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-07-31 Thread GitBox
emkornfield commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-667456601 @tianchen92 would you mind starting a thread on the ML, it seems that @jacques-n might not have bandwidth.

[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

2020-07-31 Thread GitBox
mr-smidge commented on a change in pull request #7654: URL: https://github.com/apache/arrow/pull/7654#discussion_r463888181 ## File path: csharp/test/Apache.Arrow.Tests/TestDateAndTimeData.cs ## @@ -0,0 +1,83 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

2020-07-31 Thread GitBox
mr-smidge commented on a change in pull request #7654: URL: https://github.com/apache/arrow/pull/7654#discussion_r463886892 ## File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs ## @@ -15,56 +15,103 @@ using Apache.Arrow.Types; using System; -using

[GitHub] [arrow] kou commented on issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
kou commented on issue #7864: URL: https://github.com/apache/arrow/issues/7864#issuecomment-667403163 For socket path: https://arrow.apache.org/docs/developers/contributing.html#report-bugs-and-propose-features For stopping `plasma-store-server`, ```python with

[GitHub] [arrow] andygrove commented on a change in pull request #7687: ARROW-9382: [Rust][DataFusion] Simplified hash aggregations and added Boolean type

2020-07-31 Thread GitBox
andygrove commented on a change in pull request #7687: URL: https://github.com/apache/arrow/pull/7687#discussion_r463854007 ## File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs ## @@ -327,120 +278,47 @@ impl RecordBatchReader for

[GitHub] [arrow] wesm commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-31 Thread GitBox
wesm commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r463842568 ## File path: cpp/src/arrow/util/compression.cc ## @@ -131,7 +131,7 @@ Result> Codec::Create(Compression::type codec_type, if (compression_level_set)

[GitHub] [arrow] nealrichardson commented on pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
nealrichardson commented on pull request #7875: URL: https://github.com/apache/arrow/pull/7875#issuecomment-667354516 > What do you think about making this part of the master R project (versus as a separate CRAN package)? We could. When I started trying this, I don't think I had a

[GitHub] [arrow] wesm commented on pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
wesm commented on pull request #7875: URL: https://github.com/apache/arrow/pull/7875#issuecomment-667352389 Cool! What do you think about making this part of the master R project (versus as a separate CRAN package)? I can look at the lower-level details later

[GitHub] [arrow] wesm closed issue #7835: ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)

2020-07-31 Thread GitBox
wesm closed issue #7835: URL: https://github.com/apache/arrow/issues/7835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on issue #7835: ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)

2020-07-31 Thread GitBox
wesm commented on issue #7835: URL: https://github.com/apache/arrow/issues/7835#issuecomment-667351962 I opened https://issues.apache.org/jira/browse/ARROW-9612 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] wesm closed issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
wesm closed issue #7864: URL: https://github.com/apache/arrow/issues/7864 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
wesm commented on issue #7864: URL: https://github.com/apache/arrow/issues/7864#issuecomment-667351153 Doesn't appear so. If you want to suggest a feature I recommend that you open a JIRA issue This is an automated message

[GitHub] [arrow] github-actions[bot] commented on pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7875: URL: https://github.com/apache/arrow/pull/7875#issuecomment-667349459 https://issues.apache.org/jira/browse/ARROW-3757 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson opened a new pull request #7875: ARROW-3757: [R] R bindings for Flight RPC client

2020-07-31 Thread GitBox
nealrichardson opened a new pull request #7875: URL: https://github.com/apache/arrow/pull/7875 This is a proof-of-concept R package that uses pyarrow/reticulate to provide a Flight client in order to avoid dealing with Flight in the R build setup. See the included README.md for details

[GitHub] [arrow] yordan-pavlov commented on pull request #7798: ARROW-9523 [Rust] Improve filter kernel performance

2020-07-31 Thread GitBox
yordan-pavlov commented on pull request #7798: URL: https://github.com/apache/arrow/pull/7798#issuecomment-667344474 @paddyhoran yes, you are right, I added a couple more tests for sliced arrays and they didn't pass so seeing that the PR was not yet merged I added a few small changes to

[GitHub] [arrow] github-actions[bot] commented on pull request #7873: ARROW-9608: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667261686 https://issues.apache.org/jira/browse/ARROW-9608 This is an automated message from the Apache Git

[GitHub] [arrow] sunchao commented on a change in pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
sunchao commented on a change in pull request #7873: URL: https://github.com/apache/arrow/pull/7873#discussion_r463729528 ## File path: rust/parquet/Cargo.toml ## @@ -49,7 +49,6 @@ brotli = "3.3" flate2 = "1.0" lz4 = "1.23" zstd = "0.5" -arrow = { path = "../arrow", version

[GitHub] [arrow] sunchao commented on pull request #7874: ARROW-9582: [Rust] Implement memory size methods

2020-07-31 Thread GitBox
sunchao commented on pull request #7874: URL: https://github.com/apache/arrow/pull/7874#issuecomment-667226096 Instead of `buffer_memory_size` and `total_memory_size`, I'm thinking whether `memory_used` and `memory_capacity` makes more sense.

[GitHub] [arrow] github-actions[bot] commented on pull request #7874: ARROW-9582: [Rust] Implement memory size methods

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7874: URL: https://github.com/apache/arrow/pull/7874#issuecomment-667208832 https://issues.apache.org/jira/browse/ARROW-9582 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove closed pull request #7853: ARROW-9582: [Rust] Add memory_size() method to Array [DRAFT]

2020-07-31 Thread GitBox
andygrove closed pull request #7853: URL: https://github.com/apache/arrow/pull/7853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] andygrove commented on pull request #7853: ARROW-9582: [Rust] Add memory_size() method to Array [DRAFT]

2020-07-31 Thread GitBox
andygrove commented on pull request #7853: URL: https://github.com/apache/arrow/pull/7853#issuecomment-667207713 Replaced by https://github.com/apache/arrow/pull/7874 This is an automated message from the Apache Git Service.

[GitHub] [arrow] vertexclique opened a new pull request #7874: ARROW-9582: [Rust] Implement memory size methods

2020-07-31 Thread GitBox
vertexclique opened a new pull request #7874: URL: https://github.com/apache/arrow/pull/7874 This PR is a slightly extended version of the PR https://github.com/apache/arrow/pull/7853. * `buffer_memory_size`: Only calculates internally held data size. * `total_memory_size`:

[GitHub] [arrow] lidavidm commented on pull request #7863: ARROW-9344: [C++][Flight] Measure latency quantiles

2020-07-31 Thread GitBox
lidavidm commented on pull request #7863: URL: https://github.com/apache/arrow/pull/7863#issuecomment-667190655 Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] lidavidm closed pull request #7863: ARROW-9344: [C++][Flight] Measure latency quantiles

2020-07-31 Thread GitBox
lidavidm closed pull request #7863: URL: https://github.com/apache/arrow/pull/7863 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] andygrove commented on pull request #7853: ARROW-9582: [Rust] Add memory_size() method to Array [DRAFT]

2020-07-31 Thread GitBox
andygrove commented on pull request #7853: URL: https://github.com/apache/arrow/pull/7853#issuecomment-667122842 @vertexclique Is working on a PR for this as well so I will likely close this one the new PR is up. This is an

[GitHub] [arrow] pereverges commented on issue #7864: stop plasma_store from shell script

2020-07-31 Thread GitBox
pereverges commented on issue #7864: URL: https://github.com/apache/arrow/issues/7864#issuecomment-667103343 Is there a way to choose the socket when using the plasma.start_plasma_store(): in the link https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.py#L82 does not seem

[GitHub] [arrow] svenwb commented on pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
svenwb commented on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667080040 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] svenwb removed a comment on pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
svenwb removed a comment on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667080040 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] github-actions[bot] commented on pull request #7873: ARROW-9609: [Rust] Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7873: URL: https://github.com/apache/arrow/pull/7873#issuecomment-667061089 https://issues.apache.org/jira/browse/ARROW-9609 This is an automated message from the Apache Git

[GitHub] [arrow] vertexclique opened a new pull request #7873: ARROW-9609 - Leaner feature gating for arrow in parquet

2020-07-31 Thread GitBox
vertexclique opened a new pull request #7873: URL: https://github.com/apache/arrow/pull/7873 Currently, the parquet is installing arrow-flight and it's dependencies, which breaks the CI builds and it's unnecessary because it is not used. Parquet should work without any default features by

[GitHub] [arrow] github-actions[bot] commented on pull request #7872: ARROW-9607: [C++][Gandiva] Add bitwise_and(), bitwise_or() and bitwise_not() functions for integers

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7872: URL: https://github.com/apache/arrow/pull/7872#issuecomment-667026323 https://issues.apache.org/jira/browse/ARROW-9607 This is an automated message from the Apache Git

[GitHub] [arrow] sagnikc-dremio opened a new pull request #7872: ARROW-9607: [C++][Gandiva] Add bitwise_and(), bitwise_or() and bitwise_not() functions for integers

2020-07-31 Thread GitBox
sagnikc-dremio opened a new pull request #7872: URL: https://github.com/apache/arrow/pull/7872 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] jianxind commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-07-31 Thread GitBox
jianxind commented on pull request #7871: URL: https://github.com/apache/arrow/pull/7871#issuecomment-666978438 I can trigger a benchmark action once https://github.com/apache/arrow/pull/7870 get merged. Below is the BM data for int types on my setup: ``` Before:

[GitHub] [arrow] github-actions[bot] commented on pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7871: URL: https://github.com/apache/arrow/pull/7871#issuecomment-666974009 https://issues.apache.org/jira/browse/ARROW-9605 This is an automated message from the Apache Git

[GitHub] [arrow] jianxind opened a new pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types

2020-07-31 Thread GitBox
jianxind opened a new pull request #7871: URL: https://github.com/apache/arrow/pull/7871 1. Use BitBlockCounter to speedup the performance for typical 0.01% null-able data. 2. Enable AVX compiler auto vectorize version for no-nulls on int types. Float/Double use fmin/fmax to handle NaN

[GitHub] [arrow] github-actions[bot] commented on pull request #7870: ARROW-9604: [C++] Add aggregate min/max benchmark

2020-07-31 Thread GitBox
github-actions[bot] commented on pull request #7870: URL: https://github.com/apache/arrow/pull/7870#issuecomment-666967616 https://issues.apache.org/jira/browse/ARROW-9604 This is an automated message from the Apache Git

[GitHub] [arrow] jianxind opened a new pull request #7870: ARROW-9604: [C++] Add aggregate min/max benchmark

2020-07-31 Thread GitBox
jianxind opened a new pull request #7870: URL: https://github.com/apache/arrow/pull/7870 Add benchmark for aggregate min/max compute kernels Signed-off-by: Frank Du This is an automated message from the Apache Git