[GitHub] [arrow] github-actions[bot] commented on pull request #7044: ARROW-6485: WIP: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices

2020-04-26 Thread GitBox
github-actions[bot] commented on pull request #7044: URL: https://github.com/apache/arrow/pull/7044#issuecomment-619714519 https://issues.apache.org/jira/browse/ARROW-6485 This is an automated message from the Apache Git Serv

[GitHub] [arrow] mrkn opened a new pull request #7044: ARROW-6485: WIP: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices

2020-04-26 Thread GitBox
mrkn opened a new pull request #7044: URL: https://github.com/apache/arrow/pull/7044 I'd like to add matrix-specialized COO sparse tensor format with split indices. It improves the interoperability among scipy because of reducing the copies of index arrays. ---

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-04-26 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415483479 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-04-26 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415483479 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

[GitHub] [arrow] paddyhoran commented on a change in pull request #7042: ARROW-8597 [Rust] Lints and readability improvements for arrow crate

2020-04-26 Thread GitBox
paddyhoran commented on a change in pull request #7042: URL: https://github.com/apache/arrow/pull/7042#discussion_r415478926 ## File path: rust/arrow/src/util/bit_util.rs ## @@ -75,7 +75,8 @@ pub fn set_bit(data: &mut [u8], i: usize) { /// responsible to guarantee that `i` is

[GitHub] [arrow] paddyhoran commented on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

2020-04-26 Thread GitBox
paddyhoran commented on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-619678035 Thanks @nevi-me. I re-created everything locally and I don't see a reason to keep `packed_simd` in light of these results. Also the future of packed_simd is unclear and

[GitHub] [arrow] paddyhoran edited a comment on pull request #7043: ARROW-8598: [Rust] `simd_compare_op` creates buffer of incorrect length

2020-04-26 Thread GitBox
paddyhoran edited a comment on pull request #7043: URL: https://github.com/apache/arrow/pull/7043#issuecomment-619676852 cc @yordan-pavlov This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow] paddyhoran commented on pull request #7043: ARROW-8598: [Rust] `simd_compare_op` creates buffer of incorrect length

2020-04-26 Thread GitBox
paddyhoran commented on pull request #7043: URL: https://github.com/apache/arrow/pull/7043#issuecomment-619676852 cc yordan-pavlov This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow] github-actions[bot] commented on pull request #7043: ARROW-8598: [Rust] `simd_compare_op` creates buffer of incorrect length

2020-04-26 Thread GitBox
github-actions[bot] commented on pull request #7043: URL: https://github.com/apache/arrow/pull/7043#issuecomment-619671461 https://issues.apache.org/jira/browse/ARROW-8598 This is an automated message from the Apache Git Serv

[GitHub] [arrow] cyb70289 edited a comment on pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
cyb70289 edited a comment on pull request #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-619670515 > Sorry I missed this, I think we can probably remove these for now (we can always reinstate from git history if needed). Parquet CRC uses standard CRC32 and at least for

[GitHub] [arrow] cyb70289 commented on pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
cyb70289 commented on pull request #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-619670515 > Sorry I missed this, I think we can probably remove these for now (we can always reinstate from git history if needed). Parquet CRC uses standard CRC32 and at least for intel,

[GitHub] [arrow] paddyhoran opened a new pull request #7043: ARROW-8598: [Rust] `simd_compare_op` creates buffer of incorrect length

2020-04-26 Thread GitBox
paddyhoran opened a new pull request #7043: URL: https://github.com/apache/arrow/pull/7043 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] wesm edited a comment on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-26 Thread GitBox
wesm edited a comment on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-619660367 As a high level advisory: the Datasets C++ API should still be regarded as alpha-stage, so I recommend making the JNI bindings as minimal as possible (to satisfy required func

[GitHub] [arrow] wesm commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-26 Thread GitBox
wesm commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-619660367 As a high level advisory: the Datasets C++ API should still be regarded as alpha-stage, so I recommend making the JNI bindings as minimal as possible so that refactoring is not too p

[GitHub] [arrow] wesm commented on pull request #7038: ARROW-8593: [C++][Parquet] Fix build with musl libc

2020-04-26 Thread GitBox
wesm commented on pull request #7038: URL: https://github.com/apache/arrow/pull/7038#issuecomment-619658252 @kszucs would this be caught in a rehabilitated Alpine nightly build? This is an automated message from the Apache G

[GitHub] [arrow] wesm commented on pull request #7032: ARROW-6603, ARROW-5708, ARROW-5634: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-04-26 Thread GitBox
wesm commented on pull request #7032: URL: https://github.com/apache/arrow/pull/7032#issuecomment-619657805 It's OK to put just a single issue in the PR title and mention the other resolved issues in the PR description. Thi

[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-26 Thread GitBox
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-619653776 I'll try to have a closer look tomorrow or Tuesday This is an automated message from the Apache Git Service. To respo

[GitHub] [arrow] emkornfield commented on pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
emkornfield commented on pull request #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-619639158 > > Is the function `Armv8CrcHashParallel` used somewhere? Sorry if I overlook it. > > It's not used. Actually the whole file hash_util.h is not used per [this comment]

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415417277 ## File path: cpp/src/jni/dataset/concurrent_map.h ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more co

[GitHub] [arrow] yordan-pavlov commented on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

2020-04-26 Thread GitBox
yordan-pavlov commented on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-619629922 removing packed_simd would also make this bug obsolete: https://issues.apache.org/jira/browse/ARROW-8598 Th

[GitHub] [arrow] github-actions[bot] commented on pull request #7042: ARROW-8597 [Rust] Lints and readability improvements for arrow crate

2020-04-26 Thread GitBox
github-actions[bot] commented on pull request #7042: URL: https://github.com/apache/arrow/pull/7042#issuecomment-619599024 https://issues.apache.org/jira/browse/ARROW-8597 This is an automated message from the Apache Git Serv

[GitHub] [arrow] durch opened a new pull request #7042: ARROW-8597 [Rust] Lints and readability improvements for arrow crate

2020-04-26 Thread GitBox
durch opened a new pull request #7042: URL: https://github.com/apache/arrow/pull/7042 + Pedantic fixes to `unsafe` + Changes to function arguments to pass in references or values as appropriate + Refactor pointer arithmetic to use `usize` instead of `isize` casting + Ignore generat

[GitHub] [arrow] liurenjie1024 commented on pull request #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-04-26 Thread GitBox
liurenjie1024 commented on pull request #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-619512496 Really looking forward to see this PR merged since it's quite helpful when writing tests. This is an automa

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-26 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415256413 ## File path: cpp/src/jni/dataset/concurrent_map.h ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-26 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415256413 ## File path: cpp/src/jni/dataset/concurrent_map.h ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

[GitHub] [arrow] kou commented on pull request #7041: ARROW-8584: [C++] Fix ORC link order

2020-04-26 Thread GitBox
kou commented on pull request #7041: URL: https://github.com/apache/arrow/pull/7041#issuecomment-619505351 +1 CI failures are unrelated. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] jianxind commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
jianxind commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415246739 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

[GitHub] [arrow] jianxind commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
jianxind commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-619501029 > Just curious if you see and impact on parquet-arrow-reader-writer benchmarks? That is the ultimate goal of the speedup. No impact, I checked all items for parquet-arrow-r

[GitHub] [arrow] tianchen92 opened a new pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-26 Thread GitBox
tianchen92 opened a new pull request #6912: URL: https://github.com/apache/arrow/pull/6912 Related to [ARROW-8020](https://issues.apache.org/jira/browse/ARROW-8020). In C++ side, we already have array validate functionality but no similar functionality in Java side. This issue is