[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413575153 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -40,12 +40,13 @@ if(ARROW_CPU_FLAG STREQUAL "x86") set(CXX_SUPPORTS_SSE4_2 TRUE) else(

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413576031 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inli

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413576031 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inli

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413576741 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inli

[GitHub] [arrow] emkornfield commented on issue #6985: ARROW-8413: [C++][Parquet][WIP] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield commented on issue #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-618231752 CC @wesm @pitrou I think this is ready for review now. This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] emkornfield edited a comment on issue #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
emkornfield edited a comment on issue #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-618231752 CC @wesm @pitrou I think this is ready for review now. Will take a closer look at CI failures tomorrow. This

[GitHub] [arrow] rdettai commented on issue #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
rdettai commented on issue #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618235600 I'll work on your comments today. What about the problem we are trying to fix here? Do you agree with the benefits of this fix ? Also, I'm not sure why a `Mutex` was used he

[GitHub] [arrow] rdettai edited a comment on issue #6949: ARROW-7681: [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)

2020-04-23 Thread GitBox
rdettai edited a comment on issue #6949: URL: https://github.com/apache/arrow/pull/6949#issuecomment-618235600 I'll work on your comments today. What about the problem we are trying to fix here? Do you agree with the benefits of this fix ? Also, I'm not sure why a `Mutex` was

[GitHub] [arrow] nevi-me commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
nevi-me commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618293330 > I also think it is fine for the moment to ignore the use case where the schema varies between record batches and file a separate issue for that. Just on this point, there shouldn

[GitHub] [arrow] rok commented on issue #6667: ARROW-8162: [Format][Python] Add serialization for CSF sparse tensors to Python

2020-04-23 Thread GitBox
rok commented on issue #6667: URL: https://github.com/apache/arrow/pull/6667#issuecomment-618295149 Thanks @mrkn! :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [arrow] kszucs commented on issue #6998: ARROW-8541: [Release] Don't remove previous source releases automatically

2020-04-23 Thread GitBox
kszucs commented on issue #6998: URL: https://github.com/apache/arrow/pull/6998#issuecomment-618328429 @kou updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] pitrou commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413696312 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -40,12 +40,13 @@ if(ARROW_CPU_FLAG STREQUAL "x86") set(CXX_SUPPORTS_SSE4_2 TRUE) else()

[GitHub] [arrow] pitrou commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-23 Thread GitBox
pitrou commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-618329962 cc @emkornfield This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413715323 ## File path: cpp/src/arrow/filesystem/s3fs_benchmark.cc ## @@ -331,10 +358,64 @@ BENCHMARK_DEFINE_F(MinioFixture, ReadCoalesced500Mib)(benchmark::State& st

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7000: URL: https://github.com/apache/arrow/pull/7000#discussion_r413622598 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -402,23 +401,21 @@ Result ParquetFileFormat::ScanFile( } Result> ParquetFileFormat:

[GitHub] [arrow] nevi-me opened a new pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-23 Thread GitBox
nevi-me opened a new pull request #7018: URL: https://github.com/apache/arrow/pull/7018 When a user compiles the `flight` crate, a `build.rs` script is invoked. This script recursively looks for the `format/Flight.proto` path. A user might not have that path, as they would not have cloned

[GitHub] [arrow] github-actions[bot] commented on issue #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-618349230 https://issues.apache.org/jira/browse/ARROW-8536 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #6992: URL: https://github.com/apache/arrow/pull/6992#discussion_r413740270 ## File path: python/pyarrow/pandas-shim.pxi ## @@ -55,6 +55,16 @@ cdef class _PandasAPIShim(object): from distutils.version import Loos

[GitHub] [arrow] wesm commented on issue #7017: suggestion: why not serialize complex numbers in a Python list/dict/set

2020-04-23 Thread GitBox
wesm commented on issue #7017: URL: https://github.com/apache/arrow/issues/7017#issuecomment-618352283 Can you send an email to one of the mailing lists or open a JIRA if you want to propose a development project? This is an

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413744080 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std:

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413747770 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std::m

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r413750283 ## File path: python/pyarrow/tests/test_extension_type.py ## @@ -445,22 +445,28 @@ def test_parquet(tmpdir, registered_period_type): import

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413757274 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std:

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413759177 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std:

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413760028 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std::m

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413760214 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return contents_->GetRowGroup(i

[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r413762177 ## File path: python/pyarrow/tests/test_extension_type.py ## @@ -445,22 +445,28 @@ def test_parquet(tmpdir, registered_period_type): import base64

[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
jorisvandenbossche commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618366876 cc @wesm @xhochy @BryanCutler are you fine with 1) a hard required minimal pandas version? (meaning: we don't use the pandas integration if an older version is installed

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413763701 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std:

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413783259 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std::m

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413792469 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std:

[GitHub] [arrow] tustvold commented on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
tustvold commented on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-618389453 Rebased on current master and the CI now builds :tada: This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413797661 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return contents_->GetRowGroup

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413798078 ## File path: cpp/src/arrow/filesystem/s3fs_benchmark.cc ## @@ -331,10 +358,64 @@ BENCHMARK_DEFINE_F(MinioFixture, ReadCoalesced500Mib)(benchmark::State&

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r41381 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std::m

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413800208 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std::m

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413810237 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return contents_->GetRowGroup(i

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413814578 ## File path: python/pyarrow/_parquet.pyx ## @@ -1083,6 +1084,50 @@ cdef class ParquetReader: def set_use_threads(self, bint use_threads):

[GitHub] [arrow] pitrou commented on issue #6846: ARROW-3329: [Python] Python tests for decimal to int and decimal to decimal casts

2020-04-23 Thread GitBox
pitrou commented on issue #6846: URL: https://github.com/apache/arrow/pull/6846#issuecomment-618404946 I rebased and improved the tests slightly. Also opened some issues for some oddities. This is an automated message from t

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413811549 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -260,12 +260,28 @@ class FileReaderImpl : public FileReader { Status GetRecordBatchReader(con

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413812220 ## File path: python/pyarrow/_parquet.pxd ## @@ -334,7 +334,7 @@ cdef extern from "parquet/api/reader.h" namespace "parquet" nogil: ArrowRea

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413819271 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ = std:

[GitHub] [arrow] andygrove commented on a change in pull request #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-23 Thread GitBox
andygrove commented on a change in pull request #7009: URL: https://github.com/apache/arrow/pull/7009#discussion_r413857080 ## File path: rust/parquet/src/record/api.rs ## @@ -50,6 +50,33 @@ impl Row { pub fn len(&self) -> usize { self.fields.len() } + +p

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413864236 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return contents_->GetRowGroup

[GitHub] [arrow] bkietz commented on a change in pull request #6879: ARROW-8377: [CI][C++][R] Build and run C++ tests on Rtools build

2020-04-23 Thread GitBox
bkietz commented on a change in pull request #6879: URL: https://github.com/apache/arrow/pull/6879#discussion_r413877530 ## File path: ci/scripts/PKGBUILD ## @@ -50,6 +52,12 @@ source_dir="$ARROW_HOME" cpp_build_dir=build-${CARCH}-cpp +# This should be "release" for real R

[GitHub] [arrow] nealrichardson opened a new pull request #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson opened a new pull request #7019: URL: https://github.com/apache/arrow/pull/7019 See https://github.com/apache/arrow/pull/6996#issuecomment-618053499 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618454886 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618456824 https://issues.apache.org/jira/browse/ARROW-8569 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618459883 I'm OK with this. The maintenance burden of supporting several years' worth of pandas releases seems like a lot to bear. If there are parties which are affected by this they should contribu

[GitHub] [arrow] mayuropensource opened a new pull request #7020: Arrow-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource opened a new pull request #7020: URL: https://github.com/apache/arrow/pull/7020 JIRA: https://issues.apache.org/jira/browse/ARROW-8562 This change is not actually used until https://github.com/apache/arrow/pull/6744 (@lidavidm) is pushed, however, it doesn't need to

[GitHub] [arrow] pitrou commented on issue #6846: ARROW-3329: [Python] Python tests for decimal to int and decimal to decimal casts

2020-04-23 Thread GitBox
pitrou commented on issue #6846: URL: https://github.com/apache/arrow/pull/6846#issuecomment-618464806 The CI failure looks unrelated, will merge. This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow] paddyhoran commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
paddyhoran commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413903768 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_sl

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413905959 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline voi

[GitHub] [arrow] github-actions[bot] commented on issue #7020: Arrow-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618472360 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you al

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413908372 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline voi

[GitHub] [arrow] kszucs opened a new pull request #7021: Wrap docker-compose commands with archery [WIP]

2020-04-23 Thread GitBox
kszucs opened a new pull request #7021: URL: https://github.com/apache/arrow/pull/7021 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] mayuropensource commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618486378 I messed up some commits. Will create a new one. This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] lidavidm commented on a change in pull request #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #7020: URL: https://github.com/apache/arrow/pull/7020#discussion_r413923773 ## File path: cpp/src/arrow/io/caching.h ## @@ -27,6 +27,44 @@ namespace arrow { namespace io { + +struct ARROW_EXPORT CacheOptions { + static constex

[GitHub] [arrow] github-actions[bot] commented on issue #7021: Wrap docker-compose commands with archery [WIP]

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-618494095 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you al

[GitHub] [arrow] mayuropensource opened a new pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
mayuropensource opened a new pull request #7022: URL: https://github.com/apache/arrow/pull/7022 _(Recreating the PR from a clean repo, sorry about earlier PR which was not cleanly merged)._ **JIRA:** https://issues.apache.org/jira/browse/ARROW-8562 This change is not actually

[GitHub] [arrow] markhildreth commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also fixed the type inference problem that was caused by the new dependency. @nevi-me True

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also fixed the type inference problem that was caused by the new dependency. @nevi-m

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_

[GitHub] [arrow] markhildreth commented on a change in pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #6972: URL: https://github.com/apache/arrow/pull/6972#discussion_r413951619 ## File path: rust/parquet/src/encodings/rle.rs ## @@ -830,7 +826,7 @@ mod tests { values.clear(); let mut rng = thread_rng()

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413951534 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_

[GitHub] [arrow] markhildreth commented on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618506903 From a purely practical standpoint, this PR is ready for further review and merging. If approved, I would probably add some minor issue for the following: * Trying to avoid the ty

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618501806 @andygrove Thanks for the feedback. I have updated the PR with a less leaky API. I also tweaked the parquet test to workaround the new type inference changes. @nevi-me

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413953833 ## File path: cpp/src/parquet/column_reader.cc ## @@ -50,6 +51,140 @@ using arrow::internal::checked_cast; namespace parquet { +namespace { + +inline voi

[GitHub] [arrow] markhildreth commented on a change in pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #6972: URL: https://github.com/apache/arrow/pull/6972#discussion_r413951619 ## File path: rust/parquet/src/encodings/rle.rs ## @@ -830,7 +826,7 @@ mod tests { values.clear(); let mut rng = thread_rng()

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413954827 ## File path: cpp/src/parquet/level_conversion_test.cc ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

[GitHub] [arrow] wesm commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
wesm commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618508772 @mayuropensource it's not necessary to open a new PR when you want to redo your commits, you can just force push your branch ---

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413954827 ## File path: cpp/src/parquet/level_conversion_test.cc ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

[GitHub] [arrow] github-actions[bot] commented on issue #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-618510578 https://issues.apache.org/jira/browse/ARROW-8562 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] bryantbiggs commented on issue #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-04-23 Thread GitBox
bryantbiggs commented on issue #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-618512883 thanks @andygrove ! This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413960907 ## File path: cpp/src/parquet/level_conversion.h ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[GitHub] [arrow] BryanCutler commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
BryanCutler commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618517973 Sounds good to me. FWIW, Spark also has a minimum Pandas version set at 0.23.2. This is an automated message from th

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413969298 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_slic

[GitHub] [arrow] kiszk commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-23 Thread GitBox
kiszk commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r413970301 ## File path: cpp/src/parquet/level_conversion.h ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413972358 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] markhildreth commented on a change in pull request #7006: ARROW-8508: [Rust] FixedSizeListArray improper offset for value

2020-04-23 Thread GitBox
markhildreth commented on a change in pull request #7006: URL: https://github.com/apache/arrow/pull/7006#discussion_r413958397 ## File path: rust/arrow/src/array/array.rs ## @@ -2592,6 +2619,15 @@ mod tests { assert_eq!(DataType::Int32, list_array.value_type());

[GitHub] [arrow] nealrichardson commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618527651 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618528458 Revision: 5bcfeab4c9bacc0b3a262a7522bfaf985025d3ec Submitted crossbow builds: [ursa-labs/crossbow @ actions-165](https://github.com/ursa-labs/crossbow/branches/all?quer

[GitHub] [arrow] xhochy opened a new pull request #7023: ARROW-8571: [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread GitBox
xhochy opened a new pull request #7023: URL: https://github.com/apache/arrow/pull/7023 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618506903 From a purely practical standpoint, this PR is ready for further review and merging. If approved, I would probably add some minor JIRA issue for the following: * Trying to

[GitHub] [arrow] markhildreth edited a comment on issue #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
markhildreth edited a comment on issue #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618506903 From a purely practical standpoint, this PR is ready for further review and merging. If approved, I would probably add some minor JIRA issue for the following: * Trying to

[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r413987657 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_slic

[GitHub] [arrow] pitrou commented on a change in pull request #7023: ARROW-8571: [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #7023: URL: https://github.com/apache/arrow/pull/7023#discussion_r413987573 ## File path: appveyor.yml ## @@ -61,7 +61,7 @@ environment: - JOB: "Build" GENERATOR: Ninja CONFIGURATION: "Release" - APPVEYOR_BUIL

[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618536764 The Appveyor failure is unrelated This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow] github-actions[bot] commented on issue #7023: ARROW-8571: [C++] Switch AppVeyor image to VS 2017

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7023: URL: https://github.com/apache/arrow/pull/7023#issuecomment-618536730 https://issues.apache.org/jira/browse/ARROW-8571 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] wesm commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
wesm commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618537844 Actually I'll hold off on merging this to confirm that @jorisvandenbossche has done everything that he planned This is an a

[GitHub] [arrow] lidavidm commented on a change in pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #7022: URL: https://github.com/apache/arrow/pull/7022#discussion_r413990293 ## File path: cpp/src/arrow/io/caching.h ## @@ -27,6 +27,44 @@ namespace arrow { namespace io { + +struct ARROW_EXPORT CacheOptions { + static constex

[GitHub] [arrow] xhochy commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-23 Thread GitBox
xhochy commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-618538122 👍 2 years ago released `pandas` version still sounds very generous. People who cannot upgrade from that to a newer version will probably have the same problems with `pyarrow` upda

[GitHub] [arrow] houqp commented on a change in pull request #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-23 Thread GitBox
houqp commented on a change in pull request #7009: URL: https://github.com/apache/arrow/pull/7009#discussion_r414007919 ## File path: rust/parquet/src/record/api.rs ## @@ -50,6 +50,33 @@ impl Row { pub fn len(&self) -> usize { self.fields.len() } + +pub f

[GitHub] [arrow] mayuropensource commented on a change in pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-04-23 Thread GitBox
mayuropensource commented on a change in pull request #7022: URL: https://github.com/apache/arrow/pull/7022#discussion_r414031141 ## File path: cpp/src/arrow/io/caching.h ## @@ -27,6 +27,44 @@ namespace arrow { namespace io { + +struct ARROW_EXPORT CacheOptions { + static

[GitHub] [arrow] mayuropensource commented on issue #7020: ARROW-8562: [C++] Parameterize I/O coalescing using s3 storage metrics

2020-04-23 Thread GitBox
mayuropensource commented on issue #7020: URL: https://github.com/apache/arrow/pull/7020#issuecomment-618577963 @wesm sure thing, I'll keep that in mind in the future. This is an automated message from the Apache Git Service.

[GitHub] [arrow] andygrove commented on pull request #6972: ARROW-8287: [Rust] Add "pretty" util to help with printing tabular output of RecordBatches

2020-04-23 Thread GitBox
andygrove commented on pull request #6972: URL: https://github.com/apache/arrow/pull/6972#issuecomment-618597619 @markhildreth This looks great, but is now duplicating the code between arrow and datafusion. Can we remove the datafusion utils copy and have datafusion use the arrow utils ins

  1   2   >