[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] sunchao commented on a change in pull request #6935: ARROW-8455: [Rust] Parquet Arrow column read on partially compatible files

2020-04-23 Thread GitBox
sunchao commented on a change in pull request #6935: URL: https://github.com/apache/arrow/pull/6935#discussion_r414235616 ## File path: rust/parquet/src/column/reader.rs ## @@ -190,15 +190,12 @@ impl ColumnReaderImpl { (self.num_buffered_values -

[GitHub] [arrow] sunchao edited a comment on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
sunchao edited a comment on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618740530 Yes I think it is beneficial to avoid dropping buffers with `seek`, although it will be nice if the `seek_relative` will be stabilized soon so we can just use that.

[GitHub] [arrow] sunchao commented on pull request #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
sunchao commented on pull request #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618740530 Yes I think it is beneficial to avoid dropping buffers with `seek`, although it will be nice if the `seek_relative` will be stabilized soon so we can just use that. >

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] zgramana commented on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana commented on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in that I have just come across this conversation after implementing an alternative approach much more in line with other Arrow language

[GitHub] [arrow] cyb70289 commented on pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-23 Thread GitBox
cyb70289 commented on pull request #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-618782973 > Is the function `Armv8CrcHashParallel` used somewhere? Sorry if I overlook it. It's not used. Actually the whole file hash_util.h is not used per [this

[GitHub] [arrow] github-actions[bot] commented on pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread GitBox
github-actions[bot] commented on pull request #7028: URL: https://github.com/apache/arrow/pull/7028#issuecomment-618731413 https://issues.apache.org/jira/browse/ARROW-8575 This is an automated message from the Apache Git

[GitHub] [arrow] paddyhoran commented on a change in pull request #6306: ARROW-7705: [Rust] Initial sort implementation

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6306: URL: https://github.com/apache/arrow/pull/6306#discussion_r414234200 ## File path: rust/arrow/src/compute/kernels/sort.rs ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] nealrichardson opened a new pull request #7028: ARROW-8575: [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread GitBox
nealrichardson opened a new pull request #7028: URL: https://github.com/apache/arrow/pull/7028 Instead of adding a PR comment of "This needs rebase" and wait for the author to get around to it, with this workflow you can just type "rebase" and GHA will do it for you. If it rebases

[GitHub] [arrow] zgramana edited a comment on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-04-23 Thread GitBox
zgramana edited a comment on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-618756547 @eerhardt I'd like to chime in to say that I have just come across this conversation late--and *after* implementing an alternative approach which much more in-line with

[GitHub] [arrow] mcassels commented on a change in pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-04-23 Thread GitBox
mcassels commented on a change in pull request #6770: URL: https://github.com/apache/arrow/pull/6770#discussion_r414276380 ## File path: rust/datafusion/src/utils.rs ## @@ -120,6 +143,7 @@ pub fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result {

[GitHub] [arrow] mcassels commented on a change in pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-04-23 Thread GitBox
mcassels commented on a change in pull request #6770: URL: https://github.com/apache/arrow/pull/6770#discussion_r414276861 ## File path: rust/datafusion/src/logicalplan.rs ## @@ -828,8 +828,8 @@ mod tests { .build()?; let expected = "Projection: #id\ -

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r414240892 ## File path: rust/arrow/src/array/equal.rs ## @@ -1046,6 +1062,30 @@ impl PartialEq for Value { } } +impl JsonEqual for UnionArray { +fn

[GitHub] [arrow] paddyhoran commented on pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on pull request #7004: URL: https://github.com/apache/arrow/pull/7004#issuecomment-618761556 @andygrove just going to leave a general comment as it's all related. Overall, I felt this PR was getting big, I was trying to avoid getting into the IPC stuff in this

[GitHub] [arrow] paddyhoran commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r414230710 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] paddyhoran commented on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
paddyhoran commented on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-618749586 CI is failing again, I thought this was fixed by #8558 This is an automated message from the Apache Git

[GitHub] [arrow] paddyhoran edited a comment on pull request #7024: ARROW-8573: [Rust] Upgrade Rust to 1.44 nightly

2020-04-23 Thread GitBox
paddyhoran edited a comment on pull request #7024: URL: https://github.com/apache/arrow/pull/7024#issuecomment-618749586 CI is failing again, I thought this was fixed by #7010 This is an automated message from the Apache

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r414241189 ## File path: rust/arrow/src/array/mod.rs ## @@ -85,6 +85,7 @@ mod array; mod builder; mod data; mod equal; +mod union; Review comment: Yea,

[GitHub] [arrow] pitrou commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413696312 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -40,12 +40,13 @@ if(ARROW_CPU_FLAG STREQUAL "x86") set(CXX_SUPPORTS_SSE4_2 TRUE) else()

[GitHub] [arrow] nevi-me opened a new pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-23 Thread GitBox
nevi-me opened a new pull request #7018: URL: https://github.com/apache/arrow/pull/7018 When a user compiles the `flight` crate, a `build.rs` script is invoked. This script recursively looks for the `format/Flight.proto` path. A user might not have that path, as they would not have cloned

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r413762177 ## File path: python/pyarrow/tests/test_extension_type.py ## @@ -445,22 +445,28 @@ def test_parquet(tmpdir, registered_period_type): import base64

[GitHub] [arrow] pitrou commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-23 Thread GitBox
pitrou commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-618329962 cc @emkornfield This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413715323 ## File path: cpp/src/arrow/filesystem/s3fs_benchmark.cc ## @@ -331,10 +358,64 @@ BENCHMARK_DEFINE_F(MinioFixture, ReadCoalesced500Mib)(benchmark::State&

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413744080 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413757274 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r413750283 ## File path: python/pyarrow/tests/test_extension_type.py ## @@ -445,22 +445,28 @@ def test_parquet(tmpdir, registered_period_type): import

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413760028 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413760214 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return

[GitHub] [arrow] github-actions[bot] commented on issue #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-618349230 https://issues.apache.org/jira/browse/ARROW-8536 This is an automated message from the Apache Git Service.

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413763701 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] emkornfield commented on issue #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on issue #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-618231752 CC @wesm @pitrou I think this is ready for review now. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kszucs commented on issue #6998: ARROW-8541: [Release] Don't remove previous source releases automatically

2020-04-23 Thread GitBox
kszucs commented on issue #6998: URL: https://github.com/apache/arrow/pull/6998#issuecomment-618328429 @kou updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r413622598 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -402,23 +401,21 @@ Result ParquetFileFormat::ScanFile( } Result>

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #6992: URL: https://github.com/apache/arrow/pull/6992#discussion_r413740270 ## File path: python/pyarrow/pandas-shim.pxi ## @@ -55,6 +55,16 @@ cdef class _PandasAPIShim(object): from distutils.version import

[GitHub] [arrow] wesm commented on issue #7017: suggestion: why not serialize complex numbers in a Python list/dict/set

2020-04-23 Thread GitBox
wesm commented on issue #7017: URL: https://github.com/apache/arrow/issues/7017#issuecomment-618352283 Can you send an email to one of the mailing lists or open a JIRA if you want to propose a development project? This is

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413747770 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413759177 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
jorisvandenbossche commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618366876 cc @wesm @xhochy @BryanCutler are you fine with 1) a hard required minimal pandas version? (meaning: we don't use the pandas integration if an older version is

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413533870 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,141 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { +

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413576031 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { +

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413576031 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { +

[GitHub] [arrow] rok commented on issue #6667: ARROW-8162: [Format][Python] Add serialization for CSF sparse tensors to Python

2020-04-23 Thread GitBox
rok commented on issue #6667: URL: https://github.com/apache/arrow/pull/6667#issuecomment-618295149 Thanks @mrkn! :) This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] emkornfield edited a comment on issue #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield edited a comment on issue #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-618231752 CC @wesm @pitrou I think this is ready for review now. Will take a closer look at CI failures tomorrow. This

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413575153 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -40,12 +40,13 @@ if(ARROW_CPU_FLAG STREQUAL "x86") set(CXX_SUPPORTS_SSE4_2 TRUE)

[GitHub] [arrow] nevi-me commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
nevi-me commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618293330 > I also think it is fine for the moment to ignore the use case where the schema varies between record batches and file a separate issue for that. Just on this point, there

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413576741 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { +

[GitHub] [arrow] rdettai commented on issue #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
rdettai commented on issue #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618235600 I'll work on your comments today. What about the problem we are trying to fix here? Do you agree with the benefits of this fix ? Also, I'm not sure why a `Mutex` was used

[GitHub] [arrow] rdettai edited a comment on issue #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
rdettai edited a comment on issue #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618235600 I'll work on your comments today. What about the problem we are trying to fix here? Do you agree with the benefits of this fix ? Also, I'm not sure why a `Mutex` was

[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618459883 I'm OK with this. The maintenance burden of supporting several years' worth of pandas releases seems like a lot to bear. If there are parties which are affected by this they should

[GitHub] [arrow] pitrou commented on issue #6846: ARROW-3329: [Python] Python tests for decimal to int and decimal to decimal casts

2020-04-23 Thread GitBox
pitrou commented on issue #6846: URL: https://github.com/apache/arrow/pull/6846#issuecomment-618464806 The CI failure looks unrelated, will merge. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] mayuropensource opened a new pull request #7020: Arrow-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource opened a new pull request #7020: URL: https://github.com/apache/arrow/pull/7020 JIRA: https://issues.apache.org/jira/browse/ARROW-8562 This change is not actually used until https://github.com/apache/arrow/pull/6744 (@lidavidm) is pushed, however, it doesn't need to

[GitHub] [arrow] kszucs opened a new pull request #7021: Wrap docker-compose commands with archery [WIP]

2020-04-23 Thread GitBox
kszucs opened a new pull request #7021: URL: https://github.com/apache/arrow/pull/7021 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] markhildreth commented on a change in pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #6972: URL: https://github.com/apache/arrow/pull/6972#discussion_r413951619 ## File path: rust/parquet/src/encodings/rle.rs ## @@ -830,7 +826,7 @@ mod tests { values.clear(); let mut rng =

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413970301 ## File path: cpp/src/parquet/level_conversion.h ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] github-actions[bot] commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618528458 Revision: 5bcfeab4c9bacc0b3a262a7522bfaf985025d3ec Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] paddyhoran commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413903768 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413908372 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline

[GitHub] [arrow] lidavidm commented on a change in pull request #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #7020: URL: https://github.com/apache/arrow/pull/7020#discussion_r413923773 ## File path: cpp/src/arrow/io/caching.h ## @@ -27,6 +27,44 @@ namespace arrow { namespace io { + +struct ARROW_EXPORT CacheOptions { + static

[GitHub] [arrow] mayuropensource opened a new pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
mayuropensource opened a new pull request #7022: URL: https://github.com/apache/arrow/pull/7022 _(Recreating the PR from a clean repo, sorry about earlier PR which was not cleanly merged)._ **JIRA:** https://issues.apache.org/jira/browse/ARROW-8562 This change is not actually

[GitHub] [arrow] markhildreth commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also fixed the type inference problem that was caused by the new dependency. @nevi-me True

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also fixed the type inference problem that was caused by the new dependency.

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also tweaked the parquet test to workaround the new type inference changes.

[GitHub] [arrow] markhildreth commented on a change in pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #6972: URL: https://github.com/apache/arrow/pull/6972#discussion_r413951619 ## File path: rust/parquet/src/encodings/rle.rs ## @@ -830,7 +826,7 @@ mod tests { values.clear(); let mut rng =

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413954827 ## File path: cpp/src/parquet/level_conversion_test.cc ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] wesm commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
wesm commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618508772 @mayuropensource it's not necessary to open a new PR when you want to redo your commits, you can just force push your branch

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413953833 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413960907 ## File path: cpp/src/parquet/level_conversion.h ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] nealrichardson commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618527651 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on issue #7021: Wrap docker-compose commands with archery [WIP]

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-618494095 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413954827 ## File path: cpp/src/parquet/level_conversion_test.cc ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] github-actions[bot] commented on issue #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-618510578 https://issues.apache.org/jira/browse/ARROW-8562 This is an automated message from the Apache Git Service.

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413972358 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] github-actions[bot] commented on issue #7020: Arrow-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618472360 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413905959 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline

[GitHub] [arrow] mayuropensource commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618486378 I messed up some commits. Will create a new one. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] markhildreth commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618506903 From a purely practical standpoint, this PR is ready for further review and merging. If approved, I would probably add some minor issue for the following: * Trying to avoid the

[GitHub] [arrow] bryantbiggs commented on issue #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-04-23 Thread GitBox
bryantbiggs commented on issue #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-618512883 thanks @andygrove ! This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] BryanCutler commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
BryanCutler commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618517973 Sounds good to me. FWIW, Spark also has a minimum Pandas version set at 0.23.2. This is an automated message from

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413969298 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder {

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] xhochy opened a new pull request #7023: ARROW-8571: [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread GitBox
xhochy opened a new pull request #7023: URL: https://github.com/apache/arrow/pull/7023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r41381 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413800208 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413814578 ## File path: python/pyarrow/_parquet.pyx ## @@ -1083,6 +1084,50 @@ cdef class ParquetReader: def set_use_threads(self, bint use_threads):

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413797661 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413798078 ## File path: cpp/src/arrow/filesystem/s3fs_benchmark.cc ## @@ -331,10 +358,64 @@ BENCHMARK_DEFINE_F(MinioFixture, ReadCoalesced500Mib)(benchmark::State&

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413810237 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413811549 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -260,12 +260,28 @@ class FileReaderImpl : public FileReader { Status

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413812220 ## File path: python/pyarrow/_parquet.pxd ## @@ -334,7 +334,7 @@ cdef extern from "parquet/api/reader.h" namespace "parquet" nogil:

[GitHub] [arrow] andygrove commented on a change in pull request #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-23 Thread GitBox
andygrove commented on a change in pull request #7009: URL: https://github.com/apache/arrow/pull/7009#discussion_r413857080 ## File path: rust/parquet/src/record/api.rs ## @@ -50,6 +50,33 @@ impl Row { pub fn len() -> usize { self.fields.len() } + +pub

[GitHub] [arrow] github-actions[bot] commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618456824 https://issues.apache.org/jira/browse/ARROW-8569 This is an automated message from the Apache Git Service.

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413792469 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] nealrichardson opened a new pull request #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson opened a new pull request #7019: URL: https://github.com/apache/arrow/pull/7019 See https://github.com/apache/arrow/pull/6996#issuecomment-618053499 This is an automated message from the Apache Git Service. To

  1   2   >