[GitHub] [arrow] kou commented on pull request #7041: ARROW-8584: [C++] Fix ORC link order

2020-04-25 Thread GitBox
kou commented on pull request #7041: URL: https://github.com/apache/arrow/pull/7041#issuecomment-619456003 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] mayuropensource commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-25 Thread GitBox
mayuropensource commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-619456016 A better calculation for bandwidth (by removing TTFB from total time) is done using following script: curl --negotiate -u: -o /dev/null -w

[GitHub] [arrow] mayuropensource edited a comment on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-25 Thread GitBox
mayuropensource edited a comment on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-619456016 A better calculation for bandwidth (by removing TTFB from total time) is done using following script: `curl --negotiate -u: -o /dev/null -w

[GitHub] [arrow] wjones1 commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-25 Thread GitBox
wjones1 commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-619463693 I found the cause of the test failure: If the `batch_size` isn't aligned with the `chunk_size`, categorical columns will fail with the error: ```

[GitHub] [arrow] wjones1 commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-25 Thread GitBox
wjones1 commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r415130068 ## File path: python/pyarrow/tests/test_parquet.py ## @@ -179,6 +179,99 @@ def alltypes_sample(size=1, seed=0, categorical=False):

[GitHub] [arrow] wjones1 commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-25 Thread GitBox
wjones1 commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-619437378 Two failing checks right now. For the linting one, it seems to be alarmed by some Rust code that I didn't touch. Am I missing something in that output? For the

[GitHub] [arrow] github-actions[bot] commented on pull request #7041: ARROW-8584: [C++] Fix ORC link order

2020-04-25 Thread GitBox
github-actions[bot] commented on pull request #7041: URL: https://github.com/apache/arrow/pull/7041#issuecomment-619457325 https://issues.apache.org/jira/browse/ARROW-8584 This is an automated message from the Apache Git

[GitHub] [arrow] rdettai commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-24 Thread GitBox
rdettai commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414417688 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] zhztheplayer opened a new pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
zhztheplayer opened a new pull request #7030: URL: https://github.com/apache/arrow/pull/7030 Add following Datasets APIs to Java: - DatasetFactory - Dataset - Scanner - ScanTask Add a native dataset path to bridge c++ Datasets components to Java: -

[GitHub] [arrow] jianxind commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
jianxind commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-618855696 cc @emkornfield The AVX512 path is straightforward as the helper of mask_compress/mask_expand API provide by AVX512. For potential path-finding of SSE/AVX2, as you

[GitHub] [arrow] kiszk commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
kiszk commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414434809 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] kiszk commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
kiszk commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414434434 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] nevi-me commented on pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-24 Thread GitBox
nevi-me commented on pull request #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-618942929 > @nevi-me This is looking good, but the generated source file needs the ASF header. CI is failing with ` apache-rat license violation:

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] sunchao commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-23 Thread GitBox
sunchao commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414235616 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] github-actions[bot] commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-618919224 https://issues.apache.org/jira/browse/ARROW-7808 This is an automated message from the Apache Git

[GitHub] [arrow] jianxind opened a new pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
jianxind opened a new pull request #7029: URL: https://github.com/apache/arrow/pull/7029 1. Create the spaced encoding/decoding benchmark items. 2. Create unittest for spaced API SIMD implementation. 3. Move spaced scalar/SIMD to a new head file. 4. Add the path of AVX512 epi32 and

[GitHub] [arrow] kszucs edited a comment on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
kszucs edited a comment on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618944665 There is another security constraint about this approach: anyone can trigger a rebase on the PR not just the participants / committers. To resolve that you need to check

[GitHub] [arrow] kszucs commented on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
kszucs commented on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618944665 There is another security constraint about this approach: anyone can trigger a rebase on the PR not just the participants. To resolve that you need to check `author_association`

[GitHub] [arrow] github-actions[bot] commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-618853399 https://issues.apache.org/jira/browse/ARROW-8579 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
pitrou commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-618917554 I'd gladly see a AVX2 or SSE version indeed, as many CPUs don't have AVX512. This is an automated message from the

[GitHub] [arrow] sunchao edited a comment on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
sunchao edited a comment on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618740530 Yes I think it is beneficial to avoid dropping buffers with `seek`, although it will be nice if the `seek_relative` will be stabilized soon so we can just use that.

[GitHub] [arrow] sunchao commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
sunchao commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618740530 Yes I think it is beneficial to avoid dropping buffers with `seek`, although it will be nice if the `seek_relative` will be stabilized soon so we can just use that. >

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] zgramana commented on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana commented on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in that I have just come across this conversation after implementing an alternative approach much more in line with other Arrow language

[GitHub] [arrow] cyb70289 commented on pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-23 Thread GitBox
cyb70289 commented on pull request #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-618782973 > Is the function `Armv8CrcHashParallel` used somewhere? Sorry if I overlook it. It's not used. Actually the whole file hash_util.h is not used per [this

[GitHub] [arrow] github-actions[bot] commented on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread GitBox
github-actions[bot] commented on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618731413 https://issues.apache.org/jira/browse/ARROW-8575 This is an automated message from the Apache Git

[GitHub] [arrow] paddyhoran commented on a change in pull request #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6306: URL: https://github.com/apache/arrow/pull/6306#discussion_r414234200 ## File path: rust/arrow/src/compute/kernels/sort.rs ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] nealrichardson opened a new pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread GitBox
nealrichardson opened a new pull request #7028: URL: https://github.com/apache/arrow/pull/7028 Instead of adding a PR comment of "This needs rebase" and wait for the author to get around to it, with this workflow you can just type "rebase" and GHA will do it for you. If it rebases

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] mcassels commented on a change in pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-04-23 Thread GitBox
mcassels commented on a change in pull request #6770: URL: https://github.com/apache/arrow/pull/6770#discussion_r414276380 ## File path: rust/datafusion/src/utils.rs ## @@ -120,6 +143,7 @@ pub fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result {

[GitHub] [arrow] mcassels commented on a change in pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-04-23 Thread GitBox
mcassels commented on a change in pull request #6770: URL: https://github.com/apache/arrow/pull/6770#discussion_r414276861 ## File path: rust/datafusion/src/logicalplan.rs ## @@ -828,8 +828,8 @@ mod tests { .build()?; let expected = "Projection: #id\ -

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414332379 ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r414240892 ## File path: rust/arrow/src/array/equal.rs ## @@ -1046,6 +1062,30 @@ impl PartialEq for Value { } } +impl JsonEqual for UnionArray { +fn

[GitHub] [arrow] paddyhoran commented on pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on pull request #7004: URL: https://github.com/apache/arrow/pull/7004#issuecomment-618761556 @andygrove just going to leave a general comment as it's all related. Overall, I felt this PR was getting big, I was trying to avoid getting into the IPC stuff in this

[GitHub] [arrow] paddyhoran commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r414230710 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] paddyhoran commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
paddyhoran commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-618749586 CI is failing again, I thought this was fixed by #8558 This is an automated message from the Apache Git

[GitHub] [arrow] paddyhoran edited a comment on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
paddyhoran edited a comment on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-618749586 CI is failing again, I thought this was fixed by #7010 This is an automated message from the Apache

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r414241189 ## File path: rust/arrow/src/array/mod.rs ## @@ -85,6 +85,7 @@ mod array; mod builder; mod data; mod equal; +mod union; Review comment: Yea,

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414330656 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,34 @@ static final

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414330471 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,34 @@ static final

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414330317 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,34 @@ static final

[GitHub] [arrow] kszucs edited a comment on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
kszucs edited a comment on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618944665 There is another security constraint about this approach: anyone can trigger a rebase on the PR not just the participants / committers. To resolve that you need to check

[GitHub] [arrow] jianxind commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
jianxind commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414505645 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] fsaintjacques commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-24 Thread GitBox
fsaintjacques commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-618985822 Could you accompany a script/utility to compute both metrics? Paired with toxiproxy, we could replicate S3 regions behavior with localhost.

[GitHub] [arrow] rdettai commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-24 Thread GitBox
rdettai commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618997029 > Originally we designed it this way so that we can concurrently read multiple column chunks after obtaining file handle from a single row group. Since the file handle is shared

[GitHub] [arrow] rdettai edited a comment on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-24 Thread GitBox
rdettai edited a comment on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618997029 > Originally we designed it this way so that we can concurrently read multiple column chunks after obtaining file handle from a single row group. Since the file handle is

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-25 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r415203879 ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestNettyAllocationManager.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-25 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r415203808 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -17,48 +17,97 @@ package org.apache.arrow.memory;

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-25 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r415203665 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -17,48 +17,97 @@ package org.apache.arrow.memory;

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-25 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r415203851 ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestNettyAllocationManager.java ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] emkornfield commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r415226495 ## File path: cpp/src/arrow/util/hash_util.h ## @@ -27,39 +27,27 @@ #include "arrow/util/logging.h" #include "arrow/util/macros.h" -#include

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415230291 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415233940 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] liurenjie1024 commented on pull request #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-04-26 Thread GitBox
liurenjie1024 commented on pull request #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-619512496 Really looking forward to see this PR merged since it's quite helpful when writing tests. This is an

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-25 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r415203349 ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-25 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r415202903 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -17,48 +17,97 @@ package org.apache.arrow.memory;

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-25 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r415202953 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -17,48 +17,97 @@ package org.apache.arrow.memory;

[GitHub] [arrow] emkornfield commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r415226340 ## File path: cpp/src/arrow/util/hash_util.h ## @@ -27,39 +27,27 @@ #include "arrow/util/logging.h" #include "arrow/util/macros.h" -#include

[GitHub] [arrow] emkornfield commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-619493366 Just curious if you see and impact on parquet-arrow-reader-writer benchmarks? That is the ultimate goal of the speedup.

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415231929 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-25 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r415221009 ## File path: cpp/src/parquet/level_conversion_test.cc ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-25 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r415221052 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -40,12 +40,13 @@ if(ARROW_CPU_FLAG STREQUAL "x86") set(CXX_SUPPORTS_SSE4_2 TRUE)

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-25 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-619488093 @pitrou I think I addressed your comments. One of them that went stale was the complexity for "AppendWord", I tried to remove parts that did not seem to affect performance

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-26 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r415229312 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/NonNullableStructVector.java ## @@ -320,6 +322,20 @@ public int hashCode(int

[GitHub] [arrow] tianchen92 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-26 Thread GitBox
tianchen92 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r415229163 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ValueVector.java ## @@ -283,4 +283,10 @@ * @return the name of the vector. */

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415229227 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] jianxind commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
jianxind commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-619501029 > Just curious if you see and impact on parquet-arrow-reader-writer benchmarks? That is the ultimate goal of the speedup. No impact, I checked all items for

[GitHub] [arrow] tianchen92 opened a new pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-26 Thread GitBox
tianchen92 opened a new pull request #6912: URL: https://github.com/apache/arrow/pull/6912 Related to [ARROW-8020](https://issues.apache.org/jira/browse/ARROW-8020). In C++ side, we already have array validate functionality but no similar functionality in Java side. This issue is

[GitHub] [arrow] kou commented on pull request #7041: ARROW-8584: [C++] Fix ORC link order

2020-04-26 Thread GitBox
kou commented on pull request #7041: URL: https://github.com/apache/arrow/pull/7041#issuecomment-619505351 +1 CI failures are unrelated. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-25 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r414295562 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -40,12 +40,13 @@ if(ARROW_CPU_FLAG STREQUAL "x86") set(CXX_SUPPORTS_SSE4_2 TRUE)

[GitHub] [arrow] emkornfield commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r415228325 ## File path: cpp/src/arrow/util/simd.h ## @@ -17,6 +17,24 @@ #pragma once +#ifdef _MSC_VER +// MSVC x86_64/arm64 + +#if defined(_M_AMD64) ||

[GitHub] [arrow] emkornfield commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r415226340 ## File path: cpp/src/arrow/util/hash_util.h ## @@ -27,39 +27,27 @@ #include "arrow/util/logging.h" #include "arrow/util/macros.h" -#include

[GitHub] [arrow] jianxind commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
jianxind commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415246739 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] emkornfield commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r415226495 ## File path: cpp/src/arrow/util/hash_util.h ## @@ -27,39 +27,27 @@ #include "arrow/util/logging.h" #include "arrow/util/macros.h" -#include

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415230291 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-26 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415256413 ## File path: cpp/src/jni/dataset/concurrent_map.h ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415230959 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415231122 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415231200 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415229405 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] emkornfield commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-26 Thread GitBox
emkornfield commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r415229405 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-26 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r415256413 ## File path: cpp/src/jni/dataset/concurrent_map.h ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

[GitHub] [arrow] sunchao commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-24 Thread GitBox
sunchao commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414738189 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
BryanCutler commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414695756 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -17,48 +17,97 @@ package org.apache.arrow.memory;

[GitHub] [arrow] nealrichardson commented on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
nealrichardson commented on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-619074972 I'm not worried about security risks in this particular case. If someone random person wants to rebase my PR on apache/arrow@master, great! Now I don't have to! While I

[GitHub] [arrow] nevi-me commented on a change in pull request #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-24 Thread GitBox
nevi-me commented on a change in pull request #6306: URL: https://github.com/apache/arrow/pull/6306#discussion_r414662522 ## File path: rust/arrow/src/compute/kernels/sort.rs ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] nevi-me commented on a change in pull request #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-24 Thread GitBox
nevi-me commented on a change in pull request #6306: URL: https://github.com/apache/arrow/pull/6306#discussion_r414663303 ## File path: rust/arrow/src/compute/kernels/sort.rs ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] nealrichardson commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-24 Thread GitBox
nealrichardson commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r414680234 ## File path: r/src/expression.cpp ## @@ -21,99 +21,97 @@ // [[arrow::export]] std::shared_ptr dataset___expr__field_ref(std::string name) { -

[GitHub] [arrow] kiszk commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
kiszk commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414739332 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] sunchao commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-24 Thread GitBox
sunchao commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414738189 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] sunchao commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-24 Thread GitBox
sunchao commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-619140531 > It's the reader (file handle) that is passed to it that should be thread safe Is [file](https://doc.rust-lang.org/std/fs/struct.File.html) thread-safe? it's not obvious

[GitHub] [arrow] pitrou opened a new pull request #7031: ARROW-8587: [C++] Fix linking Flight benchmarks

2020-04-24 Thread GitBox
pitrou opened a new pull request #7031: URL: https://github.com/apache/arrow/pull/7031 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] fsaintjacques commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-24 Thread GitBox
fsaintjacques commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-619186142 I'd say just plain HTTP, as @lidavidm pointed in his comment, this is a network attribute. This is an

[GitHub] [arrow] andygrove commented on pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-24 Thread GitBox
andygrove commented on pull request #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-619202633 Let's see what others say on this. Personally, I think it would be better for build.rs to automatically prepend the ASF license header because there is the risk of someone

[GitHub] [arrow] github-actions[bot] commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-619195315 https://issues.apache.org/jira/browse/ARROW-7759 This is an automated message from the Apache Git

[GitHub] [arrow] Zhen-hao opened a new issue #7034: R arrow package can't see arrow-cpp installation

2020-04-24 Thread GitBox
Zhen-hao opened a new issue #7034: URL: https://github.com/apache/arrow/issues/7034 Hi there, this is more a question than a bug request. I am using NixOS 20.03 and couldn't get the arrow library in R to see the arrow C++ library. Even when I install the library from R

[GitHub] [arrow] nealrichardson commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-24 Thread GitBox
nealrichardson commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-619213038 @github-actions rebase This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] nealrichardson commented on pull request #6879: ARROW-8377: [CI][C++][R] Build and run C++ tests on Rtools build

2020-04-24 Thread GitBox
nealrichardson commented on pull request #6879: URL: https://github.com/apache/arrow/pull/6879#issuecomment-619212781 @github-actions rebase This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] markhildreth opened a new pull request #7035: ARROW-8590: [Rust] Use arrow crate pretty util in DataFusion

2020-04-24 Thread GitBox
markhildreth opened a new pull request #7035: URL: https://github.com/apache/arrow/pull/7035 Fixes [ARROW-8590](https://issues.apache.org/jira/browse/ARROW-8590) This builds on #6972, and thus should be merged after that PR is merged.

<    1   2   3   4   5   6   7   8   9   10   >