[GitHub] [arrow] wesm commented on issue #6386: ARROW-7800 [Python] Create record batch reader interface on FileReader

2020-04-19 Thread GitBox
wesm commented on issue #6386: URL: https://github.com/apache/arrow/pull/6386#issuecomment-616271391 @wjones1 I'll close this in favor of your PR. You can always collaborate together there This is an automated message from

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616277711 @github-actions crossbow submit debian-buster-amd64 This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] github-actions[bot] commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
github-actions[bot] commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616277957 Revision: 75c7495f4df6b2c08388df1f4dc708bbc6a04ecd Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] github-actions[bot] commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
github-actions[bot] commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616288123 Revision: eacd0de2a127048bc69c3926a75ea2337d1b00df Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] wesm commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-19 Thread GitBox
wesm commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616288112 As a matter of principle, functional correctness needs to be validated by tests. If you don't test then something that is working, but not tested, may stop working as the result of

[GitHub] [arrow] cyb70289 opened a new pull request #6986: [C++] Optimize BitmapReader

2020-04-19 Thread GitBox
cyb70289 opened a new pull request #6986: URL: https://github.com/apache/arrow/pull/6986 Replacing bit offset with bit mask improves about 15% performance with gcc-7.5. Arm64 servers have similar performance uplift. clang-9 doesn't benefit from this change. Below are

[GitHub] [arrow] zhztheplayer commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-20 Thread GitBox
zhztheplayer commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616345112 OK but as the PR is already merged, maybe a follow-up JIRA ticket is needed? This is an automated message from the

[GitHub] [arrow] github-actions[bot] commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-19 Thread GitBox
github-actions[bot] commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616295380 https://issues.apache.org/jira/browse/ARROW-8523 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
kszucs commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616356603 @github-actions crossbow submit ubuntu-bionic-amd64 test-conda-cpp test-r-linux-as-cran This is an automated message

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616264904 @github-actions crossbow submit debian-buster-amd64 ubuntu-eoan-amd64 ubuntu-focal-amd64 This is an automated message from

[GitHub] [arrow] zhztheplayer commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-19 Thread GitBox
zhztheplayer commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616276272 > Unit test possible? Is unit test always required for a quick fix like this? I thought this may belong to the kind of changes that could be easily proved right.

[GitHub] [arrow] emkornfield opened a new pull request #6987: ARROW-8515: [C++] Bitmap::ToString should group by bytes

2020-04-19 Thread GitBox
emkornfield opened a new pull request #6987: URL: https://github.com/apache/arrow/pull/6987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] github-actions[bot] commented on issue #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616334547 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] kszucs opened a new pull request #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox
kszucs opened a new pull request #6988: URL: https://github.com/apache/arrow/pull/6988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616357456 Revision: f19af84b7b6af216b91a56956590ebce051b69c7 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606 Thank you for your clarification. Based on this, `Int96` structure in memory can be represented in native endian. When it will be written into a file, we carefully have to keep it

[GitHub] [arrow] kszucs commented on a change in pull request #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox
kszucs commented on a change in pull request #6961: URL: https://github.com/apache/arrow/pull/6961#discussion_r411236556 ## File path: dev/tasks/verify-rc/github.nix.yml ## @@ -64,8 +69,9 @@ jobs: fi if [ "$TEST_RUBY" = "1" ]; then ruby

[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
pitrou commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616468493 @cyb70289 It's ok, we can keep this PR. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kszucs commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kszucs commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB(https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github actions,

[GitHub] [arrow] kszucs edited a comment on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox
kszucs edited a comment on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB](https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579 I understand your point. First, I implemented the approach ` entirely little-endian`. Then, I reconsidered it. I thought that each primitive type should be represented in a

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616456984 I think data should be kept in native endianness in memory (that is what the user would expect). What we must be careful is that Parquet data is encoded (and decoded) as little endian.

[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox
kszucs commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616432127 Build failures are unrelated. +1 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579 I understand your point. First, I implemented the approach ` entirely little-endian`. Then, I reconsidered it. Each primitive type should be represented in a little-endian as shown

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606 Thank you for your clarification. Based on this, `Int96` structure in memory can be represented in native endian. When it will be written into a file, we carefully have to keep it in a

[GitHub] [arrow] pprudhvi opened a new pull request #6990: ARROW-???? : fix gandiva macos build

2020-04-20 Thread GitBox
pprudhvi opened a new pull request #6990: URL: https://github.com/apache/arrow/pull/6990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521137 @ursabot crossbow submit -g gandiva This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
ursabot commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521423 [AMD64 Conda Crossbow Submit (#101851)](https://ci.ursalabs.org/#builders/98/builds/640) builder has been succeeded. Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616530174 Can you add the explanation you gave above (about the memory layout) somewhere in `parquet/types.h`? Thank you. This is

[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox
wesm commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616533191 I thought we had discussed removing the `ARROW_USE_SIMD` option altogether This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
wesm commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379 Cool, nice improvement This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox
wesm edited a comment on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379 Cool, nice improvement (is this captures in our benchmark executables?) This is an automated message from the Apache

[GitHub] [arrow] wesm commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-20 Thread GitBox
wesm commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616500573 Yes, indeed. Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292 At first, this PR address only test cases. This PR does not address the routines (like `column_writer.h`) yet. To update test cases can clarify the specification and implementation like

[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616520741 cc @kszucs This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] pitrou opened a new pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
pitrou opened a new pull request #6991: URL: https://github.com/apache/arrow/pull/6991 NextCounts() should be parametered with the dictionary index type, not the value type. Previous code seems to have succeeded by chance on little-endian platforms. See discussion in ARROW-8486.

[GitHub] [arrow] github-actions[bot] commented on issue #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6991: URL: https://github.com/apache/arrow/pull/6991#issuecomment-616529709 https://issues.apache.org/jira/browse/ARROW-8529 This is an automated message from the Apache Git Service.

[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-20 Thread GitBox
liyafan82 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r411355185 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/NonNullableStructVector.java ## @@ -320,6 +322,20 @@ public int hashCode(int

[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616540298 Sure, at first, I derived this layout from [this method](https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.h#L576). Other resources:

[GitHub] [arrow] pitrou commented on a change in pull request #6966: ARROW-8497: [Archery] Add missing components to build options

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6966: URL: https://github.com/apache/arrow/pull/6966#discussion_r411365701 ## File path: dev/archery/archery/cli.py ## @@ -134,36 +134,72 @@ def _apply_options(cmd, options): help="CMake's CMAKE_BUILD_TYPE")

[GitHub] [arrow] github-actions[bot] commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616522599 https://issues.apache.org/jira/browse/ARROW-8528 This is an automated message from the Apache Git Service.

[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292 At first, this PR addresses only test cases. This PR does not address the routines (like `column_writer.h`) yet. To update test cases can clarify the specification and

[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-20 Thread GitBox
liyafan82 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r411352198 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ValueVector.java ## @@ -283,4 +283,10 @@ * @return the name of the vector. */

[GitHub] [arrow] jorisvandenbossche opened a new pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox
jorisvandenbossche opened a new pull request #6992: URL: https://github.com/apache/arrow/pull/6992 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox
jorisvandenbossche commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-616595450 This is still WIP (depending on which pandas version we choose, we can clean up some things in the pandas-shim.pxi), but: - I defined the minimal pandas version now as

[GitHub] [arrow] github-actions[bot] commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox
github-actions[bot] commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-616601667 https://issues.apache.org/jira/browse/ARROW-7950 This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox
wesm commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616604745 Maybe it was just a thought I had in my head but never expressed. Opened https://issues.apache.org/jira/browse/ARROW-8531

[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411416006 ## File path: .github/workflows/cpp.yml ## @@ -228,7 +228,9 @@ jobs: run: ci/scripts/util_checkout.sh - name: Build shell: bash -

[GitHub] [arrow] cyb70289 commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox
cyb70289 commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616585651 > I thought we had discussed removing the `ARROW_USE_SIMD` option altogether @wesm remove `ARROW_USE_SIMD`? I remember a thread about default simd level(

[GitHub] [arrow] nevi-me commented on pull request #6898: ARROW-8399: [Rust] Extend memory alignments to include other architectures

2020-04-27 Thread GitBox
nevi-me commented on pull request #6898: URL: https://github.com/apache/arrow/pull/6898#issuecomment-619797480 > > I am using this code in various places, but if it is against the spec or not useful for the improvements we can drop it off. > > As @pitrou says: > > > It's a

[GitHub] [arrow] rdettai commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-27 Thread GitBox
rdettai commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r415644697 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] nevi-me commented on pull request #7043: ARROW-8598: [Rust] `simd_compare_op` creates buffer of incorrect length

2020-04-27 Thread GitBox
nevi-me commented on pull request #7043: URL: https://github.com/apache/arrow/pull/7043#issuecomment-619853175 @paddyhoran do we need to worry about this, as it'd get removed by #7037? This is an automated message from the

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-04-27 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415664616 ## File path: cpp/src/jni/dataset/proto/Types.proto ## @@ -0,0 +1,149 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] rdettai commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-27 Thread GitBox
rdettai commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-619834304 I completely agree ! What I am saying is that the layer that makes the handle thread safe cannot be the `FileSource` which is in charge of tracking the reading position of one

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-04-27 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415667244 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] kszucs commented on pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-27 Thread GitBox
kszucs commented on pull request #7026: URL: https://github.com/apache/arrow/pull/7026#issuecomment-619902102 Agree that exposing `ds.field` and `ds.scalar` should be sufficient on the python side. This is an automated

[GitHub] [arrow] durch commented on pull request #7042: ARROW-8597 [Rust] Lints and readability improvements for arrow crate

2020-04-27 Thread GitBox
durch commented on pull request #7042: URL: https://github.com/apache/arrow/pull/7042#issuecomment-619798711 @nevi-me I left it as is for now, did not want to actually write any new code in this PR. After this one gets merged I'll make another pass and write some code, in addition to the

[GitHub] [arrow] vertexclique commented on a change in pull request #7036: ARROW-8591: [Rust] Reverse lookup for a key in DictionaryArray

2020-04-27 Thread GitBox
vertexclique commented on a change in pull request #7036: URL: https://github.com/apache/arrow/pull/7036#discussion_r415653220 ## File path: rust/arrow/src/array/array.rs ## @@ -1786,38 +1786,34 @@ impl From<(Vec<(Field, ArrayRef)>, Buffer, usize)> for StructArray { /// This

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-04-27 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415661628 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] jianxind edited a comment on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-27 Thread GitBox
jianxind edited a comment on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-618855696 cc @emkornfield The AVX512 path is straightforward as the helper of mask_compress/mask_expand API provide by AVX512. For potential path-finding of SSE/AVX2, as

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-27 Thread GitBox
jorisvandenbossche commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r415703472 ## File path: python/pyarrow/_dataset.pyx ## @@ -269,20 +454,21 @@ cdef class FileSystemDataset(Dataset): cdef:

[GitHub] [arrow] lidavidm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-28 Thread GitBox
lidavidm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-620601079 Ok, this should be ready. Looks like the Rust lint is failing on master. This is an automated message from the

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-28 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r416646869 ## File path: java/vector/src/main/java/org/apache/arrow/vector/validate/ValidateVectorVisitor.java ## @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] nevi-me commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-28 Thread GitBox
nevi-me commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-620649009 @andygrove this failure (https://git.data-engine.co.za/nevi-me/ifrs16-front-end-angular/pipelines/1830) doesn't make sense to me. On my machine, the relevant file is formatted

[GitHub] [arrow] paddyhoran commented on pull request #7049: [Rust] Avoid loading simd_load_set_invalid which doesn't exist on aarch64

2020-04-28 Thread GitBox
paddyhoran commented on pull request #7049: URL: https://github.com/apache/arrow/pull/7049#issuecomment-620694431 Hi @rtyler, I was going to suggest disabling SIMD as auto-vectorization seems to be adding SIMD by itself, see #7037. Also, on #7037 @nevi-me noticed the divide by

[GitHub] [arrow] shiro615 commented on a change in pull request #7051: ARROW-8612: [GLib] Add GArrowReadOptions and GArrowWriteOptions

2020-04-28 Thread GitBox
shiro615 commented on a change in pull request #7051: URL: https://github.com/apache/arrow/pull/7051#discussion_r416631341 ## File path: c_glib/test/test-write-options.rb ## @@ -0,0 +1,114 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-28 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r416644073 ## File path: java/vector/src/main/java/org/apache/arrow/vector/util/ValueVectorUtility.java ## @@ -82,4 +89,43 @@ public static String

[GitHub] [arrow] horndog opened a new issue #7055: RedHat R Install with no Internet Access

2020-04-28 Thread GitBox
horndog opened a new issue #7055: URL: https://github.com/apache/arrow/issues/7055 Hi there! Looking to install arrow via R but running into some issues. The RedHat 7.6 server is behind a firewall with no external internet access. Think that might relevant in this situation.

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-28 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r416643574 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -34,6 +34,8 @@ import org.apache.arrow.vector.types.pojo.FieldType;

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-28 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r416643695 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ExtensionTypeVector.java ## @@ -264,4 +264,5 @@ public BufferAllocator getAllocator() {

[GitHub] [arrow] markhildreth edited a comment on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-28 Thread GitBox
markhildreth edited a comment on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-620692002 @nevi-me I don't have access to the failure you linked to (on domain `git.data-engine.co.za`, but if you mean the failure showing up in CI, there was a change

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-28 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r416645213 ## File path: java/vector/src/main/java/org/apache/arrow/vector/validate/ValidateVectorVisitor.java ## @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-28 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r416645923 ## File path: java/vector/src/main/java/org/apache/arrow/vector/validate/ValidateVectorVisitor.java ## @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] markhildreth commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-28 Thread GitBox
markhildreth commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-620692002 @nevi-me I don't have access to the failure you linked to (on domain `git.data-engine.co.za`, but if you mean the failure showing up in CI, there was a change to that file

[GitHub] [arrow] markhildreth edited a comment on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-28 Thread GitBox
markhildreth edited a comment on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-620692002 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] wesm commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-04-28 Thread GitBox
wesm commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r416638207 ## File path: cpp/src/jni/dataset/proto/Types.proto ## @@ -0,0 +1,149 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7054: ARROW-8251, ARROW-7782: [Python] Preserve pandas index and extension dtypes in write_to_dataset roundtrip

2020-04-28 Thread GitBox
jorisvandenbossche opened a new pull request #7054: URL: https://github.com/apache/arrow/pull/7054 The same fix resolves both https://issues.apache.org/jira/browse/ARROW-8251 and https://issues.apache.org/jira/browse/ARROW-7782 (tests added for both)

[GitHub] [arrow] chrish42 commented on pull request #7025: ARROW-2260: [C++][Plasma] Use Gflags for command-line parsing

2020-04-28 Thread GitBox
chrish42 commented on pull request #7025: URL: https://github.com/apache/arrow/pull/7025#issuecomment-620679645 Okay, figured out how to run clang-format-8 on the code. (It feels like something that should definitely be easier, especially if it fails the build on the CI.) Let me know if

[GitHub] [arrow] markhildreth commented on pull request #7042: ARROW-8597 [Rust] Lints and readability improvements for arrow crate

2020-04-28 Thread GitBox
markhildreth commented on pull request #7042: URL: https://github.com/apache/arrow/pull/7042#issuecomment-620688091 @durch Re: "ARROW_TEST_DATA not defined", the [`/rust/README.md`](https://github.com/apache/arrow/tree/master/rust#prerequisites) file in the repo has some additional steps

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-28 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r416651561 ## File path: java/vector/src/main/java/org/apache/arrow/vector/util/ValueVectorUtility.java ## @@ -82,4 +89,43 @@ public static String

[GitHub] [arrow] github-actions[bot] commented on pull request #7054: ARROW-8251, ARROW-7782: [Python] Preserve pandas index and extension dtypes in write_to_dataset roundtrip

2020-04-28 Thread GitBox
github-actions[bot] commented on pull request #7054: URL: https://github.com/apache/arrow/pull/7054#issuecomment-620645814 https://issues.apache.org/jira/browse/ARROW-8251 This is an automated message from the Apache Git

[GitHub] [arrow] bnicholl opened a new issue #7056: AttributeError: module 'pyarrow' has no attribute 'py_buffer'

2020-04-28 Thread GitBox
bnicholl opened a new issue #7056: URL: https://github.com/apache/arrow/issues/7056 I have installed pyarrow with pip and I get this error when importing pyarrow `import pyarrow` `AttributeError: module 'pyarrow' has no attribute 'py_buffer'` I have also uninstalled

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-28 Thread GitBox
jorisvandenbossche commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r416567863 ## File path: python/pyarrow/_dataset.pyx ## @@ -41,6 +42,167 @@ def _forbid_instantiation(klass, subclasses_instead=True): raise

[GitHub] [arrow] vertexclique commented on pull request #7036: ARROW-8591: [Rust] Reverse lookup for a key in DictionaryArray

2020-04-28 Thread GitBox
vertexclique commented on pull request #7036: URL: https://github.com/apache/arrow/pull/7036#issuecomment-620638882 Do you have any ETA for merging this PR? Currently, I am using my local copy but I want to update to the git dep.

[GitHub] [arrow] paddyhoran commented on pull request #7036: ARROW-8591: [Rust] Reverse lookup for a key in DictionaryArray

2020-04-28 Thread GitBox
paddyhoran commented on pull request #7036: URL: https://github.com/apache/arrow/pull/7036#issuecomment-620683421 Was just giving other committers a chance to review. If there is no update, I'll merge later today. This is

[GitHub] [arrow] github-actions[bot] commented on pull request #7039: ARROW-8513: [Python] Expose Take with Table input in Python

2020-04-25 Thread GitBox
github-actions[bot] commented on pull request #7039: URL: https://github.com/apache/arrow/pull/7039#issuecomment-619367004 https://issues.apache.org/jira/browse/ARROW-8513 This is an automated message from the Apache Git

[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603, ARROW-5708, ARROW-5634: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-04-25 Thread GitBox
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r415072524 ## File path: csharp/src/Apache.Arrow/Apache.Arrow.csproj ## @@ -4,7 +4,7 @@ netstandard1.3;netcoreapp2.1 true

[GitHub] [arrow] gramirezespinoza opened a new pull request #7039: ARROW-8513: [Python] Expose Take with Table input in Python

2020-04-25 Thread GitBox
gramirezespinoza opened a new pull request #7039: URL: https://github.com/apache/arrow/pull/7039 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] eerhardt opened a new pull request #7040: ARROW-8505: [Release][C#] "sourcelink test" is failed by Apache.ArrowAssemblyInfo.cs

2020-04-25 Thread GitBox
eerhardt opened a new pull request #7040: URL: https://github.com/apache/arrow/pull/7040 Workaround https://github.com/dotnet/sourcelink/issues/572 by explicitly embedding the AssemblyAttributes file into the pdb. This is

[GitHub] [arrow] eerhardt commented on pull request #7032: ARROW-6603, ARROW-5708, ARROW-5634: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-04-25 Thread GitBox
eerhardt commented on pull request #7032: URL: https://github.com/apache/arrow/pull/7032#issuecomment-619396229 > ARROW-5634 by properly setting the readonly value for NullCount, which previously was hardcoded to -1. I don't believe this change addresses the issue correctly. Can we

[GitHub] [arrow] eerhardt commented on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-25 Thread GitBox
eerhardt commented on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-619396952 Thank you for this contribution, @abbotware. However, my opinion is that #7032 is more inline with how null support should be designed for the builder APIs. It also more

[GitHub] [arrow] github-actions[bot] commented on pull request #7040: ARROW-8505: [Release][C#] "sourcelink test" is failed by Apache.ArrowAssemblyInfo.cs

2020-04-25 Thread GitBox
github-actions[bot] commented on pull request #7040: URL: https://github.com/apache/arrow/pull/7040#issuecomment-619408251 https://issues.apache.org/jira/browse/ARROW-8505 This is an automated message from the Apache Git

[GitHub] [arrow] fsaintjacques commented on pull request #6731: [WIP] ARROW-8601: [Go][Flight] Added implementation of FlightDataWriter

2020-04-27 Thread GitBox
fsaintjacques commented on pull request #6731: URL: https://github.com/apache/arrow/pull/6731#issuecomment-619959812 Created https://jira.apache.org/jira/browse/ARROW-8601 for this This is an automated message from the

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6731: feat(flight): Added implementation of FlightDataWriter

2020-04-27 Thread GitBox
fsaintjacques commented on a change in pull request #6731: URL: https://github.com/apache/arrow/pull/6731#discussion_r415775496 ## File path: format/Flight.proto ## @@ -19,8 +19,17 @@ syntax = "proto3"; option java_package = "org.apache.arrow.flight.impl"; +option

[GitHub] [arrow] pitrou commented on pull request #6154: ARROW-7531: [C++] Reduce header cost

2020-04-27 Thread GitBox
pitrou commented on pull request #6154: URL: https://github.com/apache/arrow/pull/6154#issuecomment-619959114 Given the diffusion of changes accross the codebase, rebasing this wholesale would probably be painful. A better strategy would probably to retry and do some of the changes one by

[GitHub] [arrow] paddyhoran commented on pull request #7043: ARROW-8598: [Rust] `simd_compare_op` creates buffer of incorrect length

2020-04-27 Thread GitBox
paddyhoran commented on pull request #7043: URL: https://github.com/apache/arrow/pull/7043#issuecomment-619967530 @nevi-me when I added this fix I hadn't looked at your PR removing packed_simd. I introduced this bug so wanted to get a fix posted. Also, I wasn't sure what your

[GitHub] [arrow] github-actions[bot] commented on pull request #7045: ARROW-8603: [C++][Documentation] Add missing params comment

2020-04-27 Thread GitBox
github-actions[bot] commented on pull request #7045: URL: https://github.com/apache/arrow/pull/7045#issuecomment-619997013 https://issues.apache.org/jira/browse/ARROW-8603 This is an automated message from the Apache Git

[GitHub] [arrow] fsaintjacques commented on pull request #6154: ARROW-7531: [C++] Reduce header cost

2020-04-27 Thread GitBox
fsaintjacques commented on pull request #6154: URL: https://github.com/apache/arrow/pull/6154#issuecomment-619954274 @pitrou close or rebase? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-27 Thread GitBox
fsaintjacques commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r415798798 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] fsaintjacques opened a new pull request #7045: ARROW-8603: [C++][Documentation] Add missing params comment

2020-04-27 Thread GitBox
fsaintjacques opened a new pull request #7045: URL: https://github.com/apache/arrow/pull/7045 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

<    1   2   3   4   5   6   7   8   9   10   >