[GitHub] [arrow] github-actions[bot] commented on issue #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7009: URL: https://github.com/apache/arrow/pull/7009#issuecomment-617555901 https://issues.apache.org/jira/browse/ARROW-8552 This is an automated message from the Apache Git Service.

[GitHub] [arrow] houqp opened a new pull request #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-21 Thread GitBox
houqp opened a new pull request #7009: URL: https://github.com/apache/arrow/pull/7009 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on issue #7008: ARROW-8551: [CI][Gandiva] Use LLVM 8 in gandiva linux build

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7008: URL: https://github.com/apache/arrow/pull/7008#issuecomment-617523989 https://issues.apache.org/jira/browse/ARROW-8551 This is an automated message from the Apache Git Service.

[GitHub] [arrow] pprudhvi opened a new pull request #7008: ARROW-8551: [CI][Gandiva] Use LLVM 8 to build gandiva linux jar

2020-04-21 Thread GitBox
pprudhvi opened a new pull request #7008: URL: https://github.com/apache/arrow/pull/7008 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-21 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-617519736 resolved with https://github.com/Homebrew/homebrew-core/pull/53445/files. closing this This is an automated message

[GitHub] [arrow] cyb70289 commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-21 Thread GitBox
cyb70289 commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-617516947 > Maybe it was just a thought I had in my head but never expressed. Opened https://issues.apache.org/jira/browse/ARROW-8531 Updated this patch to remove ARROW_USE_SIMD

[GitHub] [arrow] wesm commented on issue #6578: ARROW-7371: WIP: [GLib] Add GLib binding of Dataset

2020-04-21 Thread GitBox
wesm commented on issue #6578: URL: https://github.com/apache/arrow/pull/6578#issuecomment-617515815 I haven't looked at the details of this binding too much, but I wanted to let you know that I'm taking a closer look at the way that filter expressions work in the datasets API in the

[GitHub] [arrow] github-actions[bot] commented on issue #7007: ARROW-8537: [C++] Revert Optimizing BitmapReader

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7007: URL: https://github.com/apache/arrow/pull/7007#issuecomment-617494295 https://issues.apache.org/jira/browse/ARROW-8537 This is an automated message from the Apache Git Service.

[GitHub] [arrow] cyb70289 opened a new pull request #7007: ARROW-8537: [C++] Revert Optimizing BitmapReader

2020-04-21 Thread GitBox
cyb70289 opened a new pull request #7007: URL: https://github.com/apache/arrow/pull/7007 Revert PR https://github.com/apache/arrow/pull/6986 as it introduces big performance regression to BitmapAnd unaligned benchmark.

[GitHub] [arrow] github-actions[bot] commented on issue #7006: ARROW-8508 [Rust] FixedSizeListArray improper offset for value

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7006: URL: https://github.com/apache/arrow/pull/7006#issuecomment-617490579 https://issues.apache.org/jira/browse/ARROW-8508 This is an automated message from the Apache Git Service.

[GitHub] [arrow] markhildreth opened a new pull request #7006: ARROW-8508 [Rust] FixedSizeListArray improper offset for value

2020-04-21 Thread GitBox
markhildreth opened a new pull request #7006: URL: https://github.com/apache/arrow/pull/7006 Potentially Fixes ARROW-8508 Fixed size list arrays sourced with a non-zero offset of their child data was respecting this offset when calculating value offsets in the `value_offset`

[GitHub] [arrow] nealrichardson commented on issue #7005: ARROW-8550: [CI] Don't run cron GHA jobs on forks

2020-04-21 Thread GitBox
nealrichardson commented on issue #7005: URL: https://github.com/apache/arrow/pull/7005#issuecomment-617477677 I believe the failure on Jira link might be expected: it's possible that the pull-request token is not sufficiently authorized to run it. @kou does that sound right? I

[GitHub] [arrow] github-actions[bot] commented on issue #7005: ARROW-8550: [CI] Don't run cron GHA jobs on forks

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7005: URL: https://github.com/apache/arrow/pull/7005#issuecomment-617456224 https://issues.apache.org/jira/browse/ARROW-8550 This is an automated message from the Apache Git Service.

[GitHub] [arrow] nealrichardson opened a new pull request #7005: ARROW-8550: [CI] Don't run cron GHA jobs on forks

2020-04-21 Thread GitBox
nealrichardson opened a new pull request #7005: URL: https://github.com/apache/arrow/pull/7005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] github-actions[bot] commented on issue #6995: ARROW-8549: [R] Assorted post-0.17 release cleanups

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-617445300 Revision: e7dbd9c977b765e618a40e997039be773c9f16bf Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson commented on issue #6995: ARROW-8549: [R] Assorted post-0.17 release cleanups

2020-04-21 Thread GitBox
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-617444872 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on issue #6995: ARROW-8549: [R] Assorted post-0.17 release cleanups

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-617442386 https://issues.apache.org/jira/browse/ARROW-8549 This is an automated message from the Apache Git Service.

[GitHub] [arrow] davidanthoff commented on a change in pull request #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
davidanthoff commented on a change in pull request #7001: URL: https://github.com/apache/arrow/pull/7001#discussion_r412497174 ## File path: cpp/cmake_modules/FindThrift.cmake ## @@ -100,7 +100,7 @@ if(Thrift_FOUND OR THRIFT_FOUND)

[GitHub] [arrow] davidanthoff commented on a change in pull request #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
davidanthoff commented on a change in pull request #7001: URL: https://github.com/apache/arrow/pull/7001#discussion_r412497174 ## File path: cpp/cmake_modules/FindThrift.cmake ## @@ -100,7 +100,7 @@ if(Thrift_FOUND OR THRIFT_FOUND)

[GitHub] [arrow] kou commented on a change in pull request #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
kou commented on a change in pull request #7001: URL: https://github.com/apache/arrow/pull/7001#discussion_r412492656 ## File path: cpp/cmake_modules/FindThrift.cmake ## @@ -100,7 +100,7 @@ if(Thrift_FOUND OR THRIFT_FOUND)

[GitHub] [arrow] paddyhoran commented on issue #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-21 Thread GitBox
paddyhoran commented on issue #6306: URL: https://github.com/apache/arrow/pull/6306#issuecomment-617402123 @nevi-me this needs a rebase now. Once you do that, I'll take a look so we can get this merged. This is an

[GitHub] [arrow] wesm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-21 Thread GitBox
wesm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r412474320 ## File path: cpp/src/parquet/file_reader.h ## @@ -117,6 +117,15 @@ class PARQUET_EXPORT ParquetFileReader { // Returns the file metadata. Only one

[GitHub] [arrow] wesm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-21 Thread GitBox
wesm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r412465641 ## File path: cpp/src/parquet/properties.h ## @@ -56,10 +60,32 @@ class PARQUET_EXPORT ReaderProperties { bool is_buffered_stream_enabled() const {

[GitHub] [arrow] paddyhoran commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-21 Thread GitBox
paddyhoran commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r412473292 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] paddyhoran commented on issue #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-21 Thread GitBox
paddyhoran commented on issue #7004: URL: https://github.com/apache/arrow/pull/7004#issuecomment-617399915 @kszucs it's failing due to `rustfmt` not being installed before testing the flight crate, any idea why this would be the case? Sorry, I don't know much about GitHub actions yet...

[GitHub] [arrow] github-actions[bot] commented on issue #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7004: URL: https://github.com/apache/arrow/pull/7004#issuecomment-617398778 https://issues.apache.org/jira/browse/ARROW-3827 This is an automated message from the Apache Git Service.

[GitHub] [arrow] paddyhoran opened a new pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-21 Thread GitBox
paddyhoran opened a new pull request #7004: URL: https://github.com/apache/arrow/pull/7004 Replaces #6209 due to git issues. Implements UnionArray. This PR was getting too big as it was so I will address the following as follow up PR's: ARROW-8546 ARROW-8547 Note

[GitHub] [arrow] paddyhoran commented on issue #6209: ARROW-3827: [Rust] Implement UnionArray

2020-04-21 Thread GitBox
paddyhoran commented on issue #6209: URL: https://github.com/apache/arrow/pull/6209#issuecomment-617389640 Closing and I'll open a new PR. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
fsaintjacques commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r412452967 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -671,41 +669,29 @@ def test_fragments(tempdir): f = fragments[0] # file's schema

[GitHub] [arrow] working-estimate opened a new issue #7003: from pyarrow import parquet fails with AttributeError: type object 'pyarrow._parquet.Statistics' has no attribute '__reduce_cython__'

2020-04-21 Thread GitBox
working-estimate opened a new issue #7003: URL: https://github.com/apache/arrow/issues/7003 I have tried versions 0.15.1, 0.16.0, 0.17.0. Same error on all. I've seen in other issues that co-installations of tensorflow and numpy might be causing issues. I have tensorflow==1.14.0 and

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617369335 After the 1-day investigation, I knew the implementation looks a little complicated. Regarding encoding, `TypedBufferBuilder` is used in some test cases, but it is not in some test

[GitHub] [arrow] fsaintjacques commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
fsaintjacques commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617368021 See either `archery benchmark diff --help` or the [benchmark](https://arrow.apache.org/docs/developers/benchmarks.html) section of the documentation. Archery can compare the same

[GitHub] [arrow] davidanthoff commented on issue #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
davidanthoff commented on issue #7001: URL: https://github.com/apache/arrow/pull/7001#issuecomment-617365369 > How does BinaryBuilder compile Windows binaries on Linux? Using MinGW? Yes, it uses MinGW for Windows, but then it also cross-compiles to lots of other platforms. The PR

[GitHub] [arrow] kszucs commented on issue #6883: Prepare for the release candidate

2020-04-21 Thread GitBox
kszucs commented on issue #6883: URL: https://github.com/apache/arrow/pull/6883#issuecomment-617350272 The release is out, we can close this PR. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
fsaintjacques commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r412407398 ## File path: cpp/src/arrow/dataset/dataset.cc ## @@ -72,36 +78,15 @@ Result> Dataset::NewScan() { return NewScan(std::make_shared()); } -bool

[GitHub] [arrow] pitrou commented on issue #7002: ARROW-8543: [C++] Single pass coalescing algorithm + Rebase

2020-04-21 Thread GitBox
pitrou commented on issue #7002: URL: https://github.com/apache/arrow/pull/7002#issuecomment-617339230 The original PR message is slightly misleading: both algorithms have the same complexity (O(N) except for the sorting step which is O(N log N)). However, it's true that the new algorithm

[GitHub] [arrow] bkietz commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
bkietz commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r412252930 ## File path: cpp/src/arrow/dataset/dataset.h ## @@ -84,13 +82,12 @@ class ARROW_DS_EXPORT Fragment { class ARROW_DS_EXPORT InMemoryFragment : public

[GitHub] [arrow] github-actions[bot] commented on issue #7002: ARROW-8543: [C++] Single pass coalescing algorithm + Rebase

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7002: URL: https://github.com/apache/arrow/pull/7002#issuecomment-617329923 https://issues.apache.org/jira/browse/ARROW-8543 This is an automated message from the Apache Git Service.

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412350484 ## File path: python/pyarrow/tests/test_extension_type.py ## @@ -445,22 +445,28 @@ def test_parquet(tmpdir, registered_period_type): import base64

[GitHub] [arrow] pitrou commented on issue #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
pitrou commented on issue #7001: URL: https://github.com/apache/arrow/pull/7001#issuecomment-617301935 (also, could you please open an issue on JIRA as explained above?) This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on issue #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
pitrou commented on issue #7001: URL: https://github.com/apache/arrow/pull/7001#issuecomment-617301716 How does `BinaryBuilder` compile Windows binaries on Linux? Using MinGW? This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412310009 ## File path: cpp/src/arrow/ipc/metadata_internal.cc ## @@ -756,10 +737,35 @@ Status FieldFromFlatbuffer(const flatbuf::Field* field, DictionaryMemo*

[GitHub] [arrow] github-actions[bot] commented on issue #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7001: URL: https://github.com/apache/arrow/pull/7001#issuecomment-617271478 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412305089 ## File path: dev/archery/archery/integration/datagen.py ## @@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():

[GitHub] [arrow] davidanthoff opened a new pull request #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
davidanthoff opened a new pull request #7001: URL: https://github.com/apache/arrow/pull/7001 With this patch I can cross-compile arrow from a Linux system, in particular I can compile Windows binaries on a Linux system (using https://binarybuilder.org/). I hope to eventually be able to

[GitHub] [arrow] wesm commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take

2020-04-21 Thread GitBox
wesm commented on issue #6970: URL: https://github.com/apache/arrow/pull/6970#issuecomment-617250883 +1. Appveyor build looks good https://ci.appveyor.com/project/wesm/arrow/builds/32336612 This is an automated message from

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
jorisvandenbossche commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r412191161 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -671,41 +669,29 @@ def test_fragments(tempdir): f = fragments[0] # file's

[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-617235575 Revision: f543317d36d39322bd339b49dd8867cbd3f2ad70 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-21 Thread GitBox
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-617234711 @github-actions crossbow submit test-r-linux-as-cran This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox
nealrichardson commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-617232353 ¯\_(ツ)_/¯ maybe it's time to port this job to GHA This is an automated message from the Apache Git Service. To

[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
wesm commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617228774 FWIW, we have some benchmark diffing code already written in https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark I'm not sure where this is documented /

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617227451 @wesm, actually I did use codebase's benchmark executable. The problem is I only focused on one case that's directly related to this change. But ignored other cases that look not

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412227300 ## File path: dev/archery/archery/integration/datagen.py ## @@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():

[GitHub] [arrow] bkietz commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
bkietz commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412194818 ## File path: dev/archery/archery/integration/datagen.py ## @@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
jorisvandenbossche commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r412150862 ## File path: cpp/src/arrow/dataset/dataset.h ## @@ -30,12 +30,22 @@ namespace arrow { namespace dataset { -/// \brief A granular piece of a

[GitHub] [arrow] gramirezespinoza commented on issue #6977: Missing `take` method in pyarrow's `Table` class

2020-04-21 Thread GitBox
gramirezespinoza commented on issue #6977: URL: https://github.com/apache/arrow/issues/6977#issuecomment-617178347 Waiting for #6970 to be approved/merged This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] wesm commented on issue #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-21 Thread GitBox
wesm commented on issue #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-617169894 Taking a look at this This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] wesm edited a comment on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-21 Thread GitBox
wesm edited a comment on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-617164152 The copy-pasta in the .yml files is a bummer. I hope one day for a higher level specification of these tasks (thank you for fixing the disk usage issue, though!)

[GitHub] [arrow] wesm commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-21 Thread GitBox
wesm commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-617164152 The copy-pasta in the .yml files is a bummer. I hope one day for a higher level specification of these tasks This is an

[GitHub] [arrow] github-actions[bot] commented on issue #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7000: URL: https://github.com/apache/arrow/pull/7000#issuecomment-617157131 https://issues.apache.org/jira/browse/ARROW-8065 This is an automated message from the Apache Git Service.

[GitHub] [arrow] fsaintjacques opened a new pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
fsaintjacques opened a new pull request #7000: URL: https://github.com/apache/arrow/pull/7000 This is the first part of a refactor to make Fragment accessible without a Scan operation instance. This is a breaking change. It introduces the concept of a physical schema and read schema,

[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
wesm commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617150324 In the meantime, when we have microperformance patches like these it would be a good practice in the future to make sure that performance results are reproduced in the codebase's benchmark

[GitHub] [arrow] github-actions[bot] commented on issue #6999: ARROW-8542: [Release] Fix checksum url in the website post release script

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6999: URL: https://github.com/apache/arrow/pull/6999#issuecomment-617150142 https://issues.apache.org/jira/browse/ARROW-8542 This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
wesm commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617149608 > BTW: we definitely need continuous benchmark tools to detect these things early. Agreed. Hopefully some progress can be made on this in 2020 since the prior discussion in 2019

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617149858 I agree with it. Once it is stable, it looks good. Under the development, developers want the reader and writer independently. At least, for me.

[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
wesm edited a comment on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379 Cool, nice improvement (is this captured in our benchmark executables?) This is an automated message from the Apache

[GitHub] [arrow] kszucs opened a new pull request #6999: ARROW-8542: [Release] Fix checksum url in the website post release script

2020-04-21 Thread GitBox
kszucs opened a new pull request #6999: URL: https://github.com/apache/arrow/pull/6999 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on issue #6998: ARROW-8541: [Release] Don't remove previous source releases automatically

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6998: URL: https://github.com/apache/arrow/pull/6998#issuecomment-617130185 https://issues.apache.org/jira/browse/ARROW-8541 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kszucs opened a new pull request #6998: ARROW-8541: [Release] Don't remove previous source releases automatically

2020-04-21 Thread GitBox
kszucs opened a new pull request #6998: URL: https://github.com/apache/arrow/pull/6998 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617126711 Perhaps. If the reader is compatible with those files, and roundtripping works, then the writer is probably compliant as well.

[GitHub] [arrow] github-actions[bot] commented on issue #6997: ARROW-8540: [C++] Add memory allocation benchmarks

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6997: URL: https://github.com/apache/arrow/pull/6997#issuecomment-617116743 https://issues.apache.org/jira/browse/ARROW-8540 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617113951 Thank you for your suggestion. I think that these files are used only for read test now. This is an automated message

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617112444 (also look for the "PARQUET_TEST_DATA" environment variable) This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617112291 @kiszk The preferred way to do that would be to add a file to the https://github.com/apache/parquet-testing repository. It's checked in as a submodule in `cpp/submodules` and used in the

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-21 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-617111538 lets wait till https://github.com/Homebrew/homebrew-core/pull/53445/files is merged. see https://issues.apache.org/jira/browse/ARROW-8539

[GitHub] [arrow] pitrou opened a new pull request #6997: ARROW-8540: [C++] Add memory allocation benchmarks

2020-04-21 Thread GitBox
pitrou opened a new pull request #6997: URL: https://github.com/apache/arrow/pull/6997 Example output: ``` --- Benchmark

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412070422 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestLargeVector.java ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412069840 ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412070083 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestLargeVector.java ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412067781 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,24 @@ static final

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412067968 ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617087421 I think that the current test cases for parquet writer do not have tests to verify the bit pattern of the generated parquet file. I will also create the test case in another PR since they

[GitHub] [arrow] pitrou commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox
pitrou commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-617067706 Wow, that is compiling OpenSSL by hand? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r412025224 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template

[GitHub] [arrow] pitrou commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox
pitrou commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-617061541 "The job exceeded the maximum log length, and has been terminated." -- restarting This is an automated message from the

[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
pitrou commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617033346 To be honest, `BitmapAnd` should probably be rewritten using `Bitmap::VisitWords`. But we can revert anyway if we fear regressions may appear in other workloads.

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616978252 This change introduces severe branch misses in certain conditions. See perf logs below. I changed benchmark code to run only the problematic test case. Without this patch