[GitHub] [arrow] rdettai commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-24 Thread GitBox
rdettai commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414417688 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] zhztheplayer opened a new pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
zhztheplayer opened a new pull request #7030: URL: https://github.com/apache/arrow/pull/7030 Add following Datasets APIs to Java: - DatasetFactory - Dataset - Scanner - ScanTask Add a native dataset path to bridge c++ Datasets components to Java: -

[GitHub] [arrow] jianxind commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
jianxind commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-618855696 cc @emkornfield The AVX512 path is straightforward as the helper of mask_compress/mask_expand API provide by AVX512. For potential path-finding of SSE/AVX2, as you

[GitHub] [arrow] kiszk commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
kiszk commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414434809 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] kiszk commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
kiszk commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414434434 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] nevi-me commented on pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-24 Thread GitBox
nevi-me commented on pull request #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-618942929 > @nevi-me This is looking good, but the generated source file needs the ASF header. CI is failing with ` apache-rat license violation:

[GitHub] [arrow] github-actions[bot] commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-618919224 https://issues.apache.org/jira/browse/ARROW-7808 This is an automated message from the Apache Git

[GitHub] [arrow] jianxind opened a new pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
jianxind opened a new pull request #7029: URL: https://github.com/apache/arrow/pull/7029 1. Create the spaced encoding/decoding benchmark items. 2. Create unittest for spaced API SIMD implementation. 3. Move spaced scalar/SIMD to a new head file. 4. Add the path of AVX512 epi32 and

[GitHub] [arrow] kszucs edited a comment on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
kszucs edited a comment on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618944665 There is another security constraint about this approach: anyone can trigger a rebase on the PR not just the participants / committers. To resolve that you need to check

[GitHub] [arrow] kszucs commented on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
kszucs commented on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618944665 There is another security constraint about this approach: anyone can trigger a rebase on the PR not just the participants. To resolve that you need to check `author_association`

[GitHub] [arrow] github-actions[bot] commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-618853399 https://issues.apache.org/jira/browse/ARROW-8579 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
pitrou commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-618917554 I'd gladly see a AVX2 or SSE version indeed, as many CPUs don't have AVX512. This is an automated message from the

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414332379 ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414330656 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,34 @@ static final

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414330471 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,34 @@ static final

[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
liyafan82 commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414330317 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,34 @@ static final

[GitHub] [arrow] kszucs edited a comment on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
kszucs edited a comment on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618944665 There is another security constraint about this approach: anyone can trigger a rebase on the PR not just the participants / committers. To resolve that you need to check

[GitHub] [arrow] jianxind commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
jianxind commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414505645 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] fsaintjacques commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-24 Thread GitBox
fsaintjacques commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-618985822 Could you accompany a script/utility to compute both metrics? Paired with toxiproxy, we could replicate S3 regions behavior with localhost.

[GitHub] [arrow] rdettai commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-24 Thread GitBox
rdettai commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618997029 > Originally we designed it this way so that we can concurrently read multiple column chunks after obtaining file handle from a single row group. Since the file handle is shared

[GitHub] [arrow] rdettai edited a comment on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-24 Thread GitBox
rdettai edited a comment on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618997029 > Originally we designed it this way so that we can concurrently read multiple column chunks after obtaining file handle from a single row group. Since the file handle is

[GitHub] [arrow] sunchao commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-24 Thread GitBox
sunchao commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414738189 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-24 Thread GitBox
BryanCutler commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r414695756 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -17,48 +17,97 @@ package org.apache.arrow.memory;

[GitHub] [arrow] nealrichardson commented on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-24 Thread GitBox
nealrichardson commented on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-619074972 I'm not worried about security risks in this particular case. If someone random person wants to rebase my PR on apache/arrow@master, great! Now I don't have to! While I

[GitHub] [arrow] nevi-me commented on a change in pull request #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-24 Thread GitBox
nevi-me commented on a change in pull request #6306: URL: https://github.com/apache/arrow/pull/6306#discussion_r414662522 ## File path: rust/arrow/src/compute/kernels/sort.rs ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] nevi-me commented on a change in pull request #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-24 Thread GitBox
nevi-me commented on a change in pull request #6306: URL: https://github.com/apache/arrow/pull/6306#discussion_r414663303 ## File path: rust/arrow/src/compute/kernels/sort.rs ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] nealrichardson commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-24 Thread GitBox
nealrichardson commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r414680234 ## File path: r/src/expression.cpp ## @@ -21,99 +21,97 @@ // [[arrow::export]] std::shared_ptr dataset___expr__field_ref(std::string name) { -

[GitHub] [arrow] kiszk commented on a change in pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-04-24 Thread GitBox
kiszk commented on a change in pull request #7029: URL: https://github.com/apache/arrow/pull/7029#discussion_r414739332 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] sunchao commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-24 Thread GitBox
sunchao commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414738189 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] sunchao commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-24 Thread GitBox
sunchao commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-619140531 > It's the reader (file handle) that is passed to it that should be thread safe Is [file](https://doc.rust-lang.org/std/fs/struct.File.html) thread-safe? it's not obvious

[GitHub] [arrow] pitrou opened a new pull request #7031: ARROW-8587: [C++] Fix linking Flight benchmarks

2020-04-24 Thread GitBox
pitrou opened a new pull request #7031: URL: https://github.com/apache/arrow/pull/7031 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] fsaintjacques commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-24 Thread GitBox
fsaintjacques commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-619186142 I'd say just plain HTTP, as @lidavidm pointed in his comment, this is a network attribute. This is an

[GitHub] [arrow] andygrove commented on pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-24 Thread GitBox
andygrove commented on pull request #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-619202633 Let's see what others say on this. Personally, I think it would be better for build.rs to automatically prepend the ASF license header because there is the risk of someone

[GitHub] [arrow] github-actions[bot] commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-619195315 https://issues.apache.org/jira/browse/ARROW-7759 This is an automated message from the Apache Git

[GitHub] [arrow] Zhen-hao opened a new issue #7034: R arrow package can't see arrow-cpp installation

2020-04-24 Thread GitBox
Zhen-hao opened a new issue #7034: URL: https://github.com/apache/arrow/issues/7034 Hi there, this is more a question than a bug request. I am using NixOS 20.03 and couldn't get the arrow library in R to see the arrow C++ library. Even when I install the library from R

[GitHub] [arrow] nealrichardson commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-24 Thread GitBox
nealrichardson commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-619213038 @github-actions rebase This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] nealrichardson commented on pull request #6879: ARROW-8377: [CI][C++][R] Build and run C++ tests on Rtools build

2020-04-24 Thread GitBox
nealrichardson commented on pull request #6879: URL: https://github.com/apache/arrow/pull/6879#issuecomment-619212781 @github-actions rebase This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] markhildreth opened a new pull request #7035: ARROW-8590: [Rust] Use arrow crate pretty util in DataFusion

2020-04-24 Thread GitBox
markhildreth opened a new pull request #7035: URL: https://github.com/apache/arrow/pull/7035 Fixes [ARROW-8590](https://issues.apache.org/jira/browse/ARROW-8590) This builds on #6972, and thus should be merged after that PR is merged.

[GitHub] [arrow] markhildreth commented on pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-24 Thread GitBox
markhildreth commented on pull request #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-619215502 Created [follow-up JIRA task](https://issues.apache.org/jira/browse/ARROW-8590). This is an automated

[GitHub] [arrow] vertexclique opened a new pull request #7036: ARROW-8591: [Rust] Reverse lookup for a key in DictionaryArray

2020-04-24 Thread GitBox
vertexclique opened a new pull request #7036: URL: https://github.com/apache/arrow/pull/7036 This PR enables reverse lookup for already built dict. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] zgramana opened a new pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values

2020-04-24 Thread GitBox
zgramana opened a new pull request #7032: URL: https://github.com/apache/arrow/pull/7032 Takes an alternative approach to completing [ARROW-6603](https://issues.apache.org/jira/browse/ARROW-6603) that is in-line with the current API and with other Arrow implementations. More

[GitHub] [arrow] mayuropensource commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-24 Thread GitBox
mayuropensource commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-619184138 @fsaintjacques, I can try to put together a python script using boto to determine the S3 metrics. Will that work for you?

[GitHub] [arrow] nealrichardson commented on issue #7034: R arrow package can't see arrow-cpp installation

2020-04-24 Thread GitBox
nealrichardson commented on issue #7034: URL: https://github.com/apache/arrow/issues/7034#issuecomment-619210756 We don't do any testing on NixOS, so it's not surprising that it doesn't just work. http://arrow.apache.org/docs/r/articles/install.html describes how dependencies are

[GitHub] [arrow] github-actions[bot] commented on pull request #7031: ARROW-8587: [C++] Fix linking Flight benchmarks

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7031: URL: https://github.com/apache/arrow/pull/7031#issuecomment-619156566 https://issues.apache.org/jira/browse/ARROW-8587 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7032: URL: https://github.com/apache/arrow/pull/7032#issuecomment-619164076 https://issues.apache.org/jira/browse/ARROW-6603 This is an automated message from the Apache Git

[GitHub] [arrow] nevi-me commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-24 Thread GitBox
nevi-me commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-619189091 @paddyhoran we might have to try a different nightly, as sometimes a day's version might have no rustfmt. The change I made in that PR installs a nightly version, I don't know

[GitHub] [arrow] bkietz opened a new pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-24 Thread GitBox
bkietz opened a new pull request #7033: URL: https://github.com/apache/arrow/pull/7033 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on pull request #7035: ARROW-8590: [Rust] Use arrow crate pretty util in DataFusion

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7035: URL: https://github.com/apache/arrow/pull/7035#issuecomment-619219686 https://issues.apache.org/jira/browse/ARROW-8590 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7036: ARROW-8591: [Rust] Reverse lookup for a key in DictionaryArray

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7036: URL: https://github.com/apache/arrow/pull/7036#issuecomment-619219685 https://issues.apache.org/jira/browse/ARROW-8591 This is an automated message from the Apache Git

[GitHub] [arrow] zgramana commented on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-24 Thread GitBox
zgramana commented on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-619162558 @eerhardt I've just submitted https://github.com/apache/arrow/pull/7032 for review/discussion This is an

[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-24 Thread GitBox
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r414856596 ## File path: cpp/src/arrow/dataset/file_csv.h ## @@ -0,0 +1,52 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] markhildreth edited a comment on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-24 Thread GitBox
markhildreth edited a comment on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-619273713 @andygrove I think there is going to be more to this than this PR. The "nightly-2019-11-14" string [can be found in a few

[GitHub] [arrow] velvia commented on a change in pull request #4815: [DISCUSS] Add strawman proposal for sparseness and data integrity

2020-04-24 Thread GitBox
velvia commented on a change in pull request #4815: URL: https://github.com/apache/arrow/pull/4815#discussion_r414877852 ## File path: format/Message.fbs ## @@ -21,10 +21,69 @@ include "Tensor.fbs"; namespace org.apache.arrow.flatbuf; +///

[GitHub] [arrow] markhildreth commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-24 Thread GitBox
markhildreth commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-619273713 @andygrove I think there is going to be more to this than this PR. The "nightly-2019-11-14" string [can be found in a few

[GitHub] [arrow] nevi-me commented on a change in pull request #7036: ARROW-8591: [Rust] Reverse lookup for a key in DictionaryArray

2020-04-24 Thread GitBox
nevi-me commented on a change in pull request #7036: URL: https://github.com/apache/arrow/pull/7036#discussion_r414875192 ## File path: rust/arrow/src/array/array.rs ## @@ -1786,38 +1786,34 @@ impl From<(Vec<(Field, ArrayRef)>, Buffer, usize)> for StructArray { /// This is

[GitHub] [arrow] nealrichardson commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-24 Thread GitBox
nealrichardson commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r414852277 ## File path: cpp/src/arrow/dataset/file_csv.h ## @@ -0,0 +1,52 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] nevi-me opened a new pull request #7037: ARROW-6718: [Rust] Remove packed_simd

2020-04-24 Thread GitBox
nevi-me opened a new pull request #7037: URL: https://github.com/apache/arrow/pull/7037 This removes the dependency on packed_simd. I initially thought that boolean kernels were slower than with explicit SIMD, but this was a false alarm as the benchmarks weren't comparing SIMD vs

[GitHub] [arrow] github-actions[bot] commented on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

2020-04-24 Thread GitBox
github-actions[bot] commented on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-619242490 https://issues.apache.org/jira/browse/ARROW-6718 This is an automated message from the Apache Git

[GitHub] [arrow] mayuropensource edited a comment on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-24 Thread GitBox
mayuropensource edited a comment on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-619276182 // SOME_S3_DATA_URI should point to a file (over http) that is ~500 MiB. // TTFB_sec is the time-to-first-byte in seconds as measured by curl //

[GitHub] [arrow] mayuropensource commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-24 Thread GitBox
mayuropensource commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-619276182 // SOME_S3_DATA_URI should point to a file (over http) that is ~500 MiB. // TTFB_sec is the time-to-first-byte in seconds as measured by curl //

[GitHub] [arrow] emkornfield commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r414972130 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ValueVector.java ## @@ -283,4 +283,10 @@ * @return the name of the vector. */

[GitHub] [arrow] emkornfield commented on pull request #7025: ARROW-2260: [C++][Plasma] Use Gflags for command-line parsing

2020-04-24 Thread GitBox
emkornfield commented on pull request #7025: URL: https://github.com/apache/arrow/pull/7025#issuecomment-619317374 @chrish42 Thank you for the PR, I'll take a look now. Note it looks like lint is failing due to formatting issues. You need to run "make format" or "ninja format" to run

[GitHub] [arrow] wjones1 commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-24 Thread GitBox
wjones1 commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r414951095 ## File path: python/pyarrow/_parquet.pyx ## @@ -1083,6 +1084,50 @@ cdef class ParquetReader: def set_use_threads(self, bint use_threads):

[GitHub] [arrow] emkornfield commented on a change in pull request #7025: ARROW-2260: [C++][Plasma] Use Gflags for command-line parsing

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7025: URL: https://github.com/apache/arrow/pull/7025#discussion_r414974208 ## File path: cpp/src/plasma/store.cc ## @@ -1207,65 +1211,77 @@ void StartServer(char* socket_name, std::string plasma_directory, bool hugepages

[GitHub] [arrow] zgramana commented on pull request #7032: ARROW-6603, ARROW-5708, ARROW-5634: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-04-24 Thread GitBox
zgramana commented on pull request #7032: URL: https://github.com/apache/arrow/pull/7032#issuecomment-619296269 @eerhardt apologies for loading up three issue in the title, but I kept finding older issues in the Apache Jira backlog that were addressed here as well, and so erred on the

[GitHub] [arrow] wjones1 commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-24 Thread GitBox
wjones1 commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r414948745 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -260,12 +260,28 @@ class FileReaderImpl : public FileReader { Status GetRecordBatchReader(const

[GitHub] [arrow] wjones1 commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-24 Thread GitBox
wjones1 commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r414948789 ## File path: python/pyarrow/_parquet.pxd ## @@ -334,7 +334,7 @@ cdef extern from "parquet/api/reader.h" namespace "parquet" nogil:

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414987154 ## File path: java/dataset/src/main/java/org/apache/arrow/dataset/jni/JniWrapper.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414987685 ## File path: java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanTask.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414987799 ## File path: java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanTask.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414988755 ## File path: java/dataset/src/test/java/org/apache/arrow/util/SchemaUtilsTest.java ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414988831 ## File path: java/pom.xml ## @@ -369,24 +369,24 @@ org.apache.maven.plugins maven-compiler-plugin 3.6.2 - -

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414988474 ## File path: java/dataset/src/main/java/org/apache/arrow/util/SchemaUtils.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414989937 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414990082 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7025: ARROW-2260: [C++][Plasma] Use Gflags for command-line parsing

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7025: URL: https://github.com/apache/arrow/pull/7025#discussion_r414975521 ## File path: cpp/src/plasma/store.cc ## @@ -1207,65 +1211,77 @@ void StartServer(char* socket_name, std::string plasma_directory, bool hugepages

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414983499 ## File path: cpp/src/jni/dataset/concurrent_map.h ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414984380 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414984900 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414985230 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414986335 ## File path: cpp/src/jni/dataset/proto/Types.proto ## @@ -0,0 +1,149 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414986841 ## File path: java/dataset/src/main/java/org/apache/arrow/dataset/file/JniWrapper.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414987860 ## File path: java/dataset/src/main/java/org/apache/arrow/dataset/source/DatasetFactory.java ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414988086 ## File path: java/dataset/src/main/java/org/apache/arrow/memory/NativeUnderlingMemory.java ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414988443 ## File path: java/dataset/src/main/java/org/apache/arrow/util/SchemaUtils.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [arrow] emkornfield commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API

2020-04-24 Thread GitBox
emkornfield commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r414989258 ## File path: java/dataset/src/test/java/org/apache/arrow/dataset/jni/NativeDatasetTest.java ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache