[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616729359 I've started a discussion on the [mailing-list](https://mail-archives.apache.org/mod_mbox/arrow-dev/) to make other people aware of your efforts. I wonder if creating a

[GitHub] [arrow] github-actions[bot] commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616667220 https://issues.apache.org/jira/browse/ARROW-8477 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616759329 Thank you for staring the discussion. I will watch at the thread. Yeah, `LETypedBufferBuilder` makes sense. It looks better than adding `AppendLE`. Regarding `Serialize`, it

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411501620 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] kiszk commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
kiszk commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411585778 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template

[GitHub] [arrow] tpboudreau commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142 Your changes look good. Thanks! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-20 Thread GitBox
BryanCutler commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r411513297 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,24 @@ static final

[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616703869 Looks like Windows long paths are enabled by default on Github Actions. Cool! This is an automated message from the

[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616748578 The remaining CI failure is unrelated. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] tpboudreau commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411569025 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn,

[GitHub] [arrow] tpboudreau opened a new pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau opened a new pull request #6993: URL: https://github.com/apache/arrow/pull/6993 This patch enables reading/writing of files with long (>260 characters) pathnames in Windows. In order for the new test to run under Windows, both (1) the test host must have long paths

[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411547763 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn,

[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411570280 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn,

[GitHub] [arrow] tpboudreau edited a comment on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox
tpboudreau edited a comment on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142 Your fixups look good. Thanks! This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] emkornfield commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
emkornfield commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411577540 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) {

[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
ursabot commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616716254 [AMD64 Conda Crossbow Submit (#101910)](https://ci.ursalabs.org/#builders/98/builds/641) builder has been succeeded. Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616715997 @ursabot crossbow submit -g gandiva This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411578282 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411675427 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] bkietz commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
bkietz commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616819386 @github-actions crossbow submit -g nightly This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] bkietz opened a new pull request #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
bkietz opened a new pull request #6994: URL: https://github.com/apache/arrow/pull/6994 Add a `status.json` to the gh-pages summary of nightly builds to get around rate limiting This is an automated message from the Apache

[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887183 @github-actions crossbow submit test-r-linux-as-cran This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820980 https://issues.apache.org/jira/browse/ARROW-8043 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kou commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
kou commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616819824 Wow! Awesome! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820253 Revision: 89cf7325ab761a35b0c8a0da7096805984e18435 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616869503 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881056 @github-actions crossbow submit test-r-linux-as-cran This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881426 Revision: 1ed83aaf5dd17d4e3b31aa1cc657f1220da2c8d4 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson opened a new pull request #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
nealrichardson opened a new pull request #6995: URL: https://github.com/apache/arrow/pull/6995 Having some trouble/slowness with r-hub for testing so made this PR to use crossbow. This is an automated message from the

[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887542 Revision: 88c0198d775796d5a39644a22840a45470b4253f Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616921669 Opened a jira card https://issues.apache.org/jira/browse/ARROW-8537 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927551 Revision: 69081241244da5decee0bf0ea3cb2f24059d244d Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
nealrichardson commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927211 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] nealrichardson opened a new pull request #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
nealrichardson opened a new pull request #6996: URL: https://github.com/apache/arrow/pull/6996 One more I didn't remove in ARROW-8222. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616931067 https://issues.apache.org/jira/browse/ARROW-8538 This is an automated message from the Apache Git Service.

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616915079 @pitrou @wesm Oops, I only checked case "BitmapReader" from benchmark

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616978252 This change introduces severe branch misses in certain conditions. See perf logs below. I changed benchmark code to run only the problematic test case. Without this patch

[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r412025224 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template

[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
pitrou commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617033346 To be honest, `BitmapAnd` should probably be rewritten using `Bitmap::VisitWords`. But we can revert anyway if we fear regressions may appear in other workloads.

[GitHub] [arrow] pitrou commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox
pitrou commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-617061541 "The job exceeded the maximum log length, and has been terminated." -- restarting This is an automated message from the

[GitHub] [arrow] pitrou commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox
pitrou commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-617067706 Wow, that is compiling OpenSSL by hand? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412067781 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,24 @@ static final

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-21 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-617111538 lets wait till https://github.com/Homebrew/homebrew-core/pull/53445/files is merged. see https://issues.apache.org/jira/browse/ARROW-8539

[GitHub] [arrow] kszucs opened a new pull request #6999: ARROW-8542: [Release] Fix checksum url in the website post release script

2020-04-21 Thread GitBox
kszucs opened a new pull request #6999: URL: https://github.com/apache/arrow/pull/6999 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
jorisvandenbossche commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r412150862 ## File path: cpp/src/arrow/dataset/dataset.h ## @@ -30,12 +30,22 @@ namespace arrow { namespace dataset { -/// \brief A granular piece of a

[GitHub] [arrow] wesm commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-21 Thread GitBox
wesm commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-617164152 The copy-pasta in the .yml files is a bummer. I hope one day for a higher level specification of these tasks This is an

[GitHub] [arrow] gramirezespinoza commented on issue #6977: Missing `take` method in pyarrow's `Table` class

2020-04-21 Thread GitBox
gramirezespinoza commented on issue #6977: URL: https://github.com/apache/arrow/issues/6977#issuecomment-617178347 Waiting for #6970 to be approved/merged This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] bkietz commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
bkietz commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412194818 ## File path: dev/archery/archery/integration/datagen.py ## @@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412227300 ## File path: dev/archery/archery/integration/datagen.py ## @@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617087421 I think that the current test cases for parquet writer do not have tests to verify the bit pattern of the generated parquet file. I will also create the test case in another PR since they

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617126711 Perhaps. If the reader is compatible with those files, and roundtripping works, then the writer is probably compliant as well.

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412070083 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestLargeVector.java ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] github-actions[bot] commented on issue #6999: ARROW-8542: [Release] Fix checksum url in the website post release script

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6999: URL: https://github.com/apache/arrow/pull/6999#issuecomment-617150142 https://issues.apache.org/jira/browse/ARROW-8542 This is an automated message from the Apache Git Service.

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r412070422 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestLargeVector.java ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] pitrou opened a new pull request #6997: ARROW-8540: [C++] Add memory allocation benchmarks

2020-04-21 Thread GitBox
pitrou opened a new pull request #6997: URL: https://github.com/apache/arrow/pull/6997 Example output: ``` --- Benchmark

[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-617235575 Revision: f543317d36d39322bd339b49dd8867cbd3f2ad70 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox
jorisvandenbossche commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r412191161 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -671,41 +669,29 @@ def test_fragments(tempdir): f = fragments[0] # file's

[GitHub] [arrow] wesm commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take

2020-04-21 Thread GitBox
wesm commented on issue #6970: URL: https://github.com/apache/arrow/pull/6970#issuecomment-617250883 +1. Appveyor build looks good https://ci.appveyor.com/project/wesm/arrow/builds/32336612 This is an automated message from

[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox
wesm commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-617228774 FWIW, we have some benchmark diffing code already written in https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark I'm not sure where this is documented /

[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox
nealrichardson commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-617232353 ¯\_(ツ)_/¯ maybe it's time to port this job to GHA This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616623385 I have been thinking about place candidates of the interface between the native endian and a PARQUET little-endian. One of the good candidates is `Serialize()` in

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412310009 ## File path: cpp/src/arrow/ipc/metadata_internal.cc ## @@ -756,10 +737,35 @@ Status FieldFromFlatbuffer(const flatbuf::Field* field, DictionaryMemo*

[GitHub] [arrow] github-actions[bot] commented on issue #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
github-actions[bot] commented on issue #7001: URL: https://github.com/apache/arrow/pull/7001#issuecomment-617271478 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] davidanthoff opened a new pull request #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox
davidanthoff opened a new pull request #7001: URL: https://github.com/apache/arrow/pull/7001 With this patch I can cross-compile arrow from a Linux system, in particular I can compile Windows binaries on a Linux system (using https://binarybuilder.org/). I hope to eventually be able to

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r412305089 ## File path: dev/archery/archery/integration/datagen.py ## @@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():

[GitHub] [arrow] github-actions[bot] commented on issue #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6989: URL: https://github.com/apache/arrow/pull/6989#issuecomment-616395745 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] jorisvandenbossche commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take

2020-04-20 Thread GitBox
jorisvandenbossche commented on issue #6970: URL: https://github.com/apache/arrow/pull/6970#issuecomment-616366094 Should we document this in the slice docstring that if the step is not 1, it will be a copy (take) and not a zero-copy view? (as I think people will typically assume no copy

[GitHub] [arrow] tustvold edited a comment on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold edited a comment on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880 I built the docker image locally and ran the same script as the CI, however, I am unable to reproduce the linker error... The ursabot issue seems to have fixed itself though,

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616424899 Hmm, I don't think that's right. `Int96` is the physical representation of 96-bit integers in Parquet files, and it's entirely little-endian. This means it should always have the same

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411174949 ## File path: rust/arrow/src/array/builder.rs ## @@ -301,6 +324,21 @@ impl BufferBuilderTrait for BufferBuilder { Ok(()) } +fn

[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616358473 https://issues.apache.org/jira/browse/ARROW-8524 This is an automated message from the Apache Git Service.

[GitHub] [arrow] jorisvandenbossche commented on issue #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox
jorisvandenbossche commented on issue #6961: URL: https://github.com/apache/arrow/pull/6961#issuecomment-616389505 > wheels-linux: 3.8 has a test failure (test_construct_from_list_of_files); François says he's seen this elsewhere. @jorisvandenbossche @kszucs is this another

[GitHub] [arrow] jorisvandenbossche opened a new pull request #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox
jorisvandenbossche opened a new pull request #6989: URL: https://github.com/apache/arrow/pull/6989 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] tustvold commented on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox
tustvold commented on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880 I built the docker image locally and ran the same script as the CI, however, I am unable to reproduce the linker error... The ursabot issue seems to have fixed itself though, which is

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616273563 @github-actions crossbow submit debian-buster-amd64 This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] cyb70289 commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-19 Thread GitBox
cyb70289 commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r411062264 ## File path: cpp/cmake_modules/DefineOptions.cmake ## @@ -101,7 +101,6 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616287781 @github-actions crossbow submit -g linux -g linux-arm This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-19 Thread GitBox
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616294784 I forgot to add jira no in the first commit, modified later. Looks jira status is not synced with this PR. Shall I abandon and push a new PR?

[GitHub] [arrow] jianxind commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-19 Thread GitBox
jianxind commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r411045620 ## File path: cpp/cmake_modules/DefineOptions.cmake ## @@ -101,7 +101,6 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")

[GitHub] [arrow] wesm commented on issue #6386: ARROW-7800 [Python] Create record batch reader interface on FileReader

2020-04-19 Thread GitBox
wesm commented on issue #6386: URL: https://github.com/apache/arrow/pull/6386#issuecomment-616271391 @wjones1 I'll close this in favor of your PR. You can always collaborate together there This is an automated message from

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616277711 @github-actions crossbow submit debian-buster-amd64 This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] github-actions[bot] commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
github-actions[bot] commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616277957 Revision: 75c7495f4df6b2c08388df1f4dc708bbc6a04ecd Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] github-actions[bot] commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
github-actions[bot] commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616288123 Revision: eacd0de2a127048bc69c3926a75ea2337d1b00df Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] wesm commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-19 Thread GitBox
wesm commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616288112 As a matter of principle, functional correctness needs to be validated by tests. If you don't test then something that is working, but not tested, may stop working as the result of

[GitHub] [arrow] cyb70289 opened a new pull request #6986: [C++] Optimize BitmapReader

2020-04-19 Thread GitBox
cyb70289 opened a new pull request #6986: URL: https://github.com/apache/arrow/pull/6986 Replacing bit offset with bit mask improves about 15% performance with gcc-7.5. Arm64 servers have similar performance uplift. clang-9 doesn't benefit from this change. Below are

[GitHub] [arrow] zhztheplayer commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-20 Thread GitBox
zhztheplayer commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616345112 OK but as the PR is already merged, maybe a follow-up JIRA ticket is needed? This is an automated message from the

[GitHub] [arrow] github-actions[bot] commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-19 Thread GitBox
github-actions[bot] commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616295380 https://issues.apache.org/jira/browse/ARROW-8523 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
kszucs commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616356603 @github-actions crossbow submit ubuntu-bionic-amd64 test-conda-cpp test-r-linux-as-cran This is an automated message

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616264904 @github-actions crossbow submit debian-buster-amd64 ubuntu-eoan-amd64 ubuntu-focal-amd64 This is an automated message from

[GitHub] [arrow] zhztheplayer commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-19 Thread GitBox
zhztheplayer commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616276272 > Unit test possible? Is unit test always required for a quick fix like this? I thought this may belong to the kind of changes that could be easily proved right.

[GitHub] [arrow] emkornfield opened a new pull request #6987: ARROW-8515: [C++] Bitmap::ToString should group by bytes

2020-04-19 Thread GitBox
emkornfield opened a new pull request #6987: URL: https://github.com/apache/arrow/pull/6987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] github-actions[bot] commented on issue #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616334547 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] kszucs opened a new pull request #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox
kszucs opened a new pull request #6988: URL: https://github.com/apache/arrow/pull/6988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616357456 Revision: f19af84b7b6af216b91a56956590ebce051b69c7 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606 Thank you for your clarification. Based on this, `Int96` structure in memory can be represented in native endian. When it will be written into a file, we carefully have to keep it

[GitHub] [arrow] kszucs commented on a change in pull request #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox
kszucs commented on a change in pull request #6961: URL: https://github.com/apache/arrow/pull/6961#discussion_r411236556 ## File path: dev/tasks/verify-rc/github.nix.yml ## @@ -64,8 +69,9 @@ jobs: fi if [ "$TEST_RUBY" = "1" ]; then ruby

[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
pitrou commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616468493 @cyb70289 It's ok, we can keep this PR. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kszucs commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kszucs commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB(https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github actions,

[GitHub] [arrow] kszucs edited a comment on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kszucs edited a comment on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB](https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579 I understand your point. First, I implemented the approach ` entirely little-endian`. Then, I reconsidered it. I thought that each primitive type should be represented in a

  1   2   3   4   5   6   7   8   9   10   >