[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616729359 I've started a discussion on the [mailing-list](https://mail-archives.apache.org/mod_mbox/arrow-dev/) to make other people aware of your efforts. I wonder if creating a

[GitHub] [arrow] github-actions[bot] commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616667220 https://issues.apache.org/jira/browse/ARROW-8477 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616759329 Thank you for staring the discussion. I will watch at the thread. Yeah, `LETypedBufferBuilder` makes sense. It looks better than adding `AppendLE`. Regarding `Serialize`, it

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411501620 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] kiszk commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
kiszk commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411585778 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template

[GitHub] [arrow] tpboudreau commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142 Your changes look good. Thanks! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-20 Thread GitBox
BryanCutler commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r411513297 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,24 @@ static final

[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616703869 Looks like Windows long paths are enabled by default on Github Actions. Cool! This is an automated message from the

[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616748578 The remaining CI failure is unrelated. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] tpboudreau commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411569025 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn,

[GitHub] [arrow] tpboudreau opened a new pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau opened a new pull request #6993: URL: https://github.com/apache/arrow/pull/6993 This patch enables reading/writing of files with long (>260 characters) pathnames in Windows. In order for the new test to run under Windows, both (1) the test host must have long paths

[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411547763 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn,

[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411570280 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn,

[GitHub] [arrow] tpboudreau edited a comment on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau edited a comment on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142 Your fixups look good. Thanks! This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] emkornfield commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
emkornfield commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411577540 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) {

[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
ursabot commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616716254 [AMD64 Conda Crossbow Submit (#101910)](https://ci.ursalabs.org/#builders/98/builds/641) builder has been succeeded. Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616715997 @ursabot crossbow submit -g gandiva This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] tpboudreau commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616759574 Thanks @pitrou for jumping on this so quickly. This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411578282 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411675427 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] bkietz commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
bkietz commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616819386 @github-actions crossbow submit -g nightly This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] bkietz opened a new pull request #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
bkietz opened a new pull request #6994: URL: https://github.com/apache/arrow/pull/6994 Add a `status.json` to the gh-pages summary of nightly builds to get around rate limiting This is an automated message from the Apache

[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887183 @github-actions crossbow submit test-r-linux-as-cran This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820980 https://issues.apache.org/jira/browse/ARROW-8043 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kou commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
kou commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616819824 Wow! Awesome! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820253 Revision: 89cf7325ab761a35b0c8a0da7096805984e18435 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616869503 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881056 @github-actions crossbow submit test-r-linux-as-cran This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881426 Revision: 1ed83aaf5dd17d4e3b31aa1cc657f1220da2c8d4 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson opened a new pull request #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
nealrichardson opened a new pull request #6995: URL: https://github.com/apache/arrow/pull/6995 Having some trouble/slowness with r-hub for testing so made this PR to use crossbow. This is an automated message from the

[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887542 Revision: 88c0198d775796d5a39644a22840a45470b4253f Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616921669 Opened a jira card https://issues.apache.org/jira/browse/ARROW-8537 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927551 Revision: 69081241244da5decee0bf0ea3cb2f24059d244d Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
nealrichardson commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927211 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] nealrichardson opened a new pull request #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
nealrichardson opened a new pull request #6996: URL: https://github.com/apache/arrow/pull/6996 One more I didn't remove in ARROW-8222. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616931067 https://issues.apache.org/jira/browse/ARROW-8538 This is an automated message from the Apache Git Service.

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616915079 @pitrou @wesm Oops, I only checked case "BitmapReader" from benchmark

[GitHub] [arrow] emkornfield commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
emkornfield commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411882849 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) {

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616623385 I have been thinking about place candidates of the interface between the native endian and a PARQUET little-endian. One of the good candidates is `Serialize()` in

[GitHub] [arrow] github-actions[bot] commented on issue #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6989: URL: https://github.com/apache/arrow/pull/6989#issuecomment-616395745 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] jorisvandenbossche commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take

2020-04-20 Thread GitBox
jorisvandenbossche commented on issue #6970: URL: https://github.com/apache/arrow/pull/6970#issuecomment-616366094 Should we document this in the slice docstring that if the step is not 1, it will be a copy (take) and not a zero-copy view? (as I think people will typically assume no copy

[GitHub] [arrow] tustvold edited a comment on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold edited a comment on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880 I built the docker image locally and ran the same script as the CI, however, I am unable to reproduce the linker error... The ursabot issue seems to have fixed itself though,

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616424899 Hmm, I don't think that's right. `Int96` is the physical representation of 96-bit integers in Parquet files, and it's entirely little-endian. This means it should always have the same

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411174949 ## File path: rust/arrow/src/array/builder.rs ## @@ -301,6 +324,21 @@ impl BufferBuilderTrait for BufferBuilder { Ok(()) } +fn

[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616358473 https://issues.apache.org/jira/browse/ARROW-8524 This is an automated message from the Apache Git Service.

[GitHub] [arrow] jorisvandenbossche commented on issue #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox
jorisvandenbossche commented on issue #6961: URL: https://github.com/apache/arrow/pull/6961#issuecomment-616389505 > wheels-linux: 3.8 has a test failure (test_construct_from_list_of_files); François says he's seen this elsewhere. @jorisvandenbossche @kszucs is this another

[GitHub] [arrow] jorisvandenbossche opened a new pull request #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox
jorisvandenbossche opened a new pull request #6989: URL: https://github.com/apache/arrow/pull/6989 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] tustvold commented on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold commented on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880 I built the docker image locally and ran the same script as the CI, however, I am unable to reproduce the linker error... The ursabot issue seems to have fixed itself though, which is

[GitHub] [arrow] zhztheplayer commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-20 Thread GitBox
zhztheplayer commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616345112 OK but as the PR is already merged, maybe a follow-up JIRA ticket is needed? This is an automated message from the

[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
kszucs commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616356603 @github-actions crossbow submit ubuntu-bionic-amd64 test-conda-cpp test-r-linux-as-cran This is an automated message

[GitHub] [arrow] github-actions[bot] commented on issue #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616334547 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] kszucs opened a new pull request #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox
kszucs opened a new pull request #6988: URL: https://github.com/apache/arrow/pull/6988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616357456 Revision: f19af84b7b6af216b91a56956590ebce051b69c7 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606 Thank you for your clarification. Based on this, `Int96` structure in memory can be represented in native endian. When it will be written into a file, we carefully have to keep it

[GitHub] [arrow] kszucs commented on a change in pull request #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox
kszucs commented on a change in pull request #6961: URL: https://github.com/apache/arrow/pull/6961#discussion_r411236556 ## File path: dev/tasks/verify-rc/github.nix.yml ## @@ -64,8 +69,9 @@ jobs: fi if [ "$TEST_RUBY" = "1" ]; then ruby

[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
pitrou commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616468493 @cyb70289 It's ok, we can keep this PR. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kszucs commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kszucs commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB(https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github actions,

[GitHub] [arrow] kszucs edited a comment on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kszucs edited a comment on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB](https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579 I understand your point. First, I implemented the approach ` entirely little-endian`. Then, I reconsidered it. I thought that each primitive type should be represented in a

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616456984 I think data should be kept in native endianness in memory (that is what the user would expect). What we must be careful is that Parquet data is encoded (and decoded) as little endian.

[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
kszucs commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616432127 Build failures are unrelated. +1 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579 I understand your point. First, I implemented the approach ` entirely little-endian`. Then, I reconsidered it. Each primitive type should be represented in a little-endian as shown

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606 Thank you for your clarification. Based on this, `Int96` structure in memory can be represented in native endian. When it will be written into a file, we carefully have to keep it in a

[GitHub] [arrow] pprudhvi opened a new pull request #6990: ARROW-???? : fix gandiva macos build

2020-04-20 Thread GitBox
pprudhvi opened a new pull request #6990: URL: https://github.com/apache/arrow/pull/6990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521137 @ursabot crossbow submit -g gandiva This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
ursabot commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521423 [AMD64 Conda Crossbow Submit (#101851)](https://ci.ursalabs.org/#builders/98/builds/640) builder has been succeeded. Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616530174 Can you add the explanation you gave above (about the memory layout) somewhere in `parquet/types.h`? Thank you. This is

[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox
wesm commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616533191 I thought we had discussed removing the `ARROW_USE_SIMD` option altogether This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
wesm commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379 Cool, nice improvement This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
wesm edited a comment on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379 Cool, nice improvement (is this captures in our benchmark executables?) This is an automated message from the Apache

[GitHub] [arrow] wesm commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-20 Thread GitBox
wesm commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616500573 Yes, indeed. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292 At first, this PR address only test cases. This PR does not address the routines (like `column_writer.h`) yet. To update test cases can clarify the specification and implementation like

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616520741 cc @kszucs This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] pitrou opened a new pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
pitrou opened a new pull request #6991: URL: https://github.com/apache/arrow/pull/6991 NextCounts() should be parametered with the dictionary index type, not the value type. Previous code seems to have succeeded by chance on little-endian platforms. See discussion in ARROW-8486.

[GitHub] [arrow] github-actions[bot] commented on issue #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6991: URL: https://github.com/apache/arrow/pull/6991#issuecomment-616529709 https://issues.apache.org/jira/browse/ARROW-8529 This is an automated message from the Apache Git Service.

[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-20 Thread GitBox
liyafan82 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r411355185 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/NonNullableStructVector.java ## @@ -320,6 +322,20 @@ public int hashCode(int

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616540298 Sure, at first, I derived this layout from [this method](https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.h#L576). Other resources:

[GitHub] [arrow] pitrou commented on a change in pull request #6966: ARROW-8497: [Archery] Add missing components to build options

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6966: URL: https://github.com/apache/arrow/pull/6966#discussion_r411365701 ## File path: dev/archery/archery/cli.py ## @@ -134,36 +134,72 @@ def _apply_options(cmd, options): help="CMake's CMAKE_BUILD_TYPE")

[GitHub] [arrow] github-actions[bot] commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616522599 https://issues.apache.org/jira/browse/ARROW-8528 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292 At first, this PR addresses only test cases. This PR does not address the routines (like `column_writer.h`) yet. To update test cases can clarify the specification and

[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-20 Thread GitBox
liyafan82 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r411352198 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ValueVector.java ## @@ -283,4 +283,10 @@ * @return the name of the vector. */

[GitHub] [arrow] jorisvandenbossche opened a new pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox
jorisvandenbossche opened a new pull request #6992: URL: https://github.com/apache/arrow/pull/6992 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox
jorisvandenbossche commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-616595450 This is still WIP (depending on which pandas version we choose, we can clean up some things in the pandas-shim.pxi), but: - I defined the minimal pandas version now as

[GitHub] [arrow] github-actions[bot] commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-616601667 https://issues.apache.org/jira/browse/ARROW-7950 This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox
wesm commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616604745 Maybe it was just a thought I had in my head but never expressed. Opened https://issues.apache.org/jira/browse/ARROW-8531

[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411416006 ## File path: .github/workflows/cpp.yml ## @@ -228,7 +228,9 @@ jobs: run: ci/scripts/util_checkout.sh - name: Build shell: bash -

[GitHub] [arrow] cyb70289 commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox
cyb70289 commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616585651 > I thought we had discussed removing the `ARROW_USE_SIMD` option altogether @wesm remove `ARROW_USE_SIMD`? I remember a thread about default simd level(