[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413798078 ## File path: cpp/src/arrow/filesystem/s3fs_benchmark.cc ## @@ -331,10 +358,64 @@ BENCHMARK_DEFINE_F(MinioFixture, ReadCoalesced500Mib)(benchmark::State&

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413810237 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413811549 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -260,12 +260,28 @@ class FileReaderImpl : public FileReader { Status

[GitHub] [arrow] fsaintjacques commented on a change in pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-04-23 Thread GitBox
fsaintjacques commented on a change in pull request #6979: URL: https://github.com/apache/arrow/pull/6979#discussion_r413812220 ## File path: python/pyarrow/_parquet.pxd ## @@ -334,7 +334,7 @@ cdef extern from "parquet/api/reader.h" namespace "parquet" nogil:

[GitHub] [arrow] andygrove commented on a change in pull request #7009: ARROW-8552: [Rust] support iterate parquet row columns

2020-04-23 Thread GitBox
andygrove commented on a change in pull request #7009: URL: https://github.com/apache/arrow/pull/7009#discussion_r413857080 ## File path: rust/parquet/src/record/api.rs ## @@ -50,6 +50,33 @@ impl Row { pub fn len() -> usize { self.fields.len() } + +pub

[GitHub] [arrow] github-actions[bot] commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
github-actions[bot] commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618456824 https://issues.apache.org/jira/browse/ARROW-8569 This is an automated message from the Apache Git Service.

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413792469 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] nealrichardson opened a new pull request #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson opened a new pull request #7019: URL: https://github.com/apache/arrow/pull/7019 See https://github.com/apache/arrow/pull/6996#issuecomment-618053499 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson commented on issue #7019: ARROW-8569: [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread GitBox
nealrichardson commented on issue #7019: URL: https://github.com/apache/arrow/pull/7019#issuecomment-618454886 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] pitrou commented on issue #6846: ARROW-3329: [Python] Python tests for decimal to int and decimal to decimal casts

2020-04-23 Thread GitBox
pitrou commented on issue #6846: URL: https://github.com/apache/arrow/pull/6846#issuecomment-618404946 I rebased and improved the tests slightly. Also opened some issues for some oddities. This is an automated message from

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413864236 ## File path: cpp/src/parquet/file_reader.cc ## @@ -536,6 +577,14 @@ std::shared_ptr ParquetFileReader::RowGroup(int i) { return

[GitHub] [arrow] pitrou commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
pitrou commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413783259 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] tustvold commented on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-23 Thread GitBox
tustvold commented on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-618389453 Rebased on current master and the CI now builds :tada: This is an automated message from the Apache Git Service. To

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-23 Thread GitBox
lidavidm commented on a change in pull request #6744: URL: https://github.com/apache/arrow/pull/6744#discussion_r413819271 ## File path: cpp/src/parquet/file_reader.cc ## @@ -212,6 +237,21 @@ class SerializedFile : public ParquetFileReader::Contents { file_metadata_ =

[GitHub] [arrow] bkietz commented on a change in pull request #6879: ARROW-8377: [CI][C++][R] Build and run C++ tests on Rtools build

2020-04-23 Thread GitBox
bkietz commented on a change in pull request #6879: URL: https://github.com/apache/arrow/pull/6879#discussion_r413877530 ## File path: ci/scripts/PKGBUILD ## @@ -50,6 +52,12 @@ source_dir="$ARROW_HOME" cpp_build_dir=build-${CARCH}-cpp +# This should be "release" for real

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417358718 ## File path: cpp/src/arrow/util/iterator.h ## @@ -138,11 +139,11 @@ class Iterator : public util::EqualityComparable> { value_ =

[GitHub] [arrow] wesm commented on pull request #6506: ARROW-7878: [C++][Compute] Draft LogicalPlan classes

2020-04-29 Thread GitBox
wesm commented on pull request #6506: URL: https://github.com/apache/arrow/pull/6506#issuecomment-621262247 I'm going to review this again in the near future. I don't think this is blocking anything at the moment? This is

[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r417324952 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] wesm opened a new pull request #7060: ARROW-8619: [C++] Use distinct enum values for MonthInterval, DayTimeInterval

2020-04-29 Thread GitBox
wesm opened a new pull request #7060: URL: https://github.com/apache/arrow/pull/7060 This enables us to eliminate "special" handling of `Type::INTERVAL` since these types have a different internal data representation. The deleted code in this PR is evidence of this. This is a

[GitHub] [arrow] github-actions[bot] commented on pull request #7060: ARROW-8619: [C++] Use distinct enum values for MonthInterval, DayTimeInterval

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7060: URL: https://github.com/apache/arrow/pull/7060#issuecomment-621239897 https://issues.apache.org/jira/browse/ARROW-8619 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621285608 @github-actions crossbow submit test-conda-python-3.7-pandas-latest test-debian-10-cpp test-conda-python-3.7 This

[GitHub] [arrow] pitrou commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
pitrou commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621294001 @lidavidm If you want to read three files in parallel, and process data as it arrives (i.e. not necessarily in order), I think the datasets API is what you want. I don't

[GitHub] [arrow] lidavidm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
lidavidm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621304473 > However, beware that, the more remote IO requests you issue in parallel, the longer they'll take individually given fixed bandwidth limits. Yes, and this is why our

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417341382 ## File path: cpp/src/arrow/scalar.cc ## @@ -252,6 +270,100 @@ Result> Scalar::Parse(const std::shared_ptr& t return ScalarParseImpl{type, s}.Finish();

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417341608 ## File path: cpp/src/arrow/scalar.cc ## @@ -252,6 +270,100 @@ Result> Scalar::Parse(const std::shared_ptr& t return ScalarParseImpl{type, s}.Finish();

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417342047 ## File path: cpp/src/arrow/dataset/filter.cc ## @@ -1261,7 +1264,212 @@ Result> TreeEvaluator::Filter( return batch->Slice(0, 0); } -std::shared_ptr

[GitHub] [arrow] chrish42 commented on pull request #7025: ARROW-2260: [C++][Plasma] Use Gflags for command-line parsing

2020-04-29 Thread GitBox
chrish42 commented on pull request #7025: URL: https://github.com/apache/arrow/pull/7025#issuecomment-621243467 @emkornfield I'm chrish42 on the Apache JIRA too. Thanks! This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-04-29 Thread GitBox
wesm commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r417390692 ## File path: cpp/src/jni/dataset/proto/Types.proto ## @@ -0,0 +1,149 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] wesm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
wesm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621299892 @pitrou I think the problem is the global IO thread pool https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/interfaces.cc#L310 So if you read multiple files

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621200635 Revision: d6f60e9a47716224cdd1b2148b9efc754b0d3ba9 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417347149 ## File path: python/pyarrow/_dataset.pyx ## @@ -270,19 +425,20 @@ cdef class FileSystemDataset(Dataset): CFileSystemDataset* filesystem_dataset

[GitHub] [arrow] wesm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
wesm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621266083 Yes, we should discuss on the mailing list. For the record, IO-related tasks should almost certainly not be using the default global thread pool, which is intended for

[GitHub] [arrow] wesm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
wesm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621305031 Yeah, I think one definite thing that needs to happen at minimum is externalizing the thread pool used for asynchronous IO calls so that the user is able to set whatever concurrency

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
fsaintjacques commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417218364 ## File path: cpp/src/arrow/scalar.cc ## @@ -252,6 +270,100 @@ Result> Scalar::Parse(const std::shared_ptr& t return ScalarParseImpl{type,

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417342700 ## File path: cpp/src/arrow/dataset/filter.cc ## @@ -1261,7 +1264,212 @@ Result> TreeEvaluator::Filter( return batch->Slice(0, 0); } -std::shared_ptr

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621287754 Revision: aff6c71b2c2f838489cf18a65d24dd4e68067230 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] wesm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
wesm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621308415 I wrote up a ticket for round-robin task scheduling which might help with this https://issues.apache.org/jira/browse/ARROW-8626

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
fsaintjacques commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417348377 ## File path: python/pyarrow/_dataset.pyx ## @@ -270,19 +425,20 @@ cdef class FileSystemDataset(Dataset): CFileSystemDataset*

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621251535 Revision: 33c81628d4f4f9da45911798858b85f05a3646d3 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417364778 ## File path: cpp/src/arrow/dataset/filter.h ## @@ -191,6 +197,10 @@ class ARROW_DS_EXPORT Expression { /// returns a debug string representing this

[GitHub] [arrow] kszucs commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621250583 @github-actions crossbow submit test-conda-python-3.7-pandas-latest test-debian-10-cpp test-conda-python-3.7 This

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417375378 ## File path: cpp/src/arrow/scalar.cc ## @@ -252,6 +270,100 @@ Result> Scalar::Parse(const std::shared_ptr& t return ScalarParseImpl{type, s}.Finish();

[GitHub] [arrow] wesm commented on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format

2020-04-29 Thread GitBox
wesm commented on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-621260531 Note we have `arrow/csv/converter_benchmark.cc`. Maybe you can put the benchmarks there (or otherwise in the arrow/csv directory)?

[GitHub] [arrow] wesm commented on issue #7055: RedHat R Install with no Internet Access

2020-04-29 Thread GitBox
wesm commented on issue #7055: URL: https://github.com/apache/arrow/issues/7055#issuecomment-621277483 You need to `yum install arrow-dataset-devel` also. This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417340926 ## File path: cpp/src/arrow/scalar.h ## @@ -78,6 +78,8 @@ struct ARROW_EXPORT Scalar : public util::EqualityComparable { // TODO(bkietz) add

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417340605 ## File path: cpp/src/arrow/type.cc ## @@ -43,6 +43,49 @@ #include "arrow/visitor_inline.h" namespace arrow { + +constexpr Type::type NullType::type_id;

[GitHub] [arrow] kszucs commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621266712 @github-actions crossbow submit test-conda-python-3.7-pandas-latest test-debian-10-cpp test-conda-python-3.7 This

[GitHub] [arrow] pitrou commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
pitrou commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621302160 Ah, I see. The number of threads is certainly enough for local files, but it could be higher for remote filesystems. Ideally, perhaps each kind of remote filesystem has its own

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
fsaintjacques commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417218364 ## File path: cpp/src/arrow/scalar.cc ## @@ -252,6 +270,100 @@ Result> Scalar::Parse(const std::shared_ptr& t return ScalarParseImpl{type,

[GitHub] [arrow] paddyhoran commented on pull request #7059: [Rust] Allow the parquet crate to be compiled on aarch64 platforms

2020-04-29 Thread GitBox
paddyhoran commented on pull request #7059: URL: https://github.com/apache/arrow/pull/7059#issuecomment-621210799 https://issues.apache.org/jira/browse/ARROW-8622 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-29 Thread GitBox
nealrichardson commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-621263610 I'll fix the R failures in the next couple of hours (docs need revision). This is an automated message

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621272162 Revision: 356984236aa4f34059eb770831a49197e454a12c Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-30 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-621974840 @wesm wanted to make sure this is still on your radar This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-30 Thread GitBox
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-621993208 Yep sorry thanks for the nudge, will look today This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622023582 Revision: 5988be2a5f9b283a8bc1012714fc03ff57b453c4 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] fsaintjacques opened a new pull request #7075: ARROW-8447: [C++] Ensure row ordering in Scanner::ToTable

2020-04-30 Thread GitBox
fsaintjacques opened a new pull request #7075: URL: https://github.com/apache/arrow/pull/7075 * This fixes the issue where ScanTask would race to push to the accumulating RecordBatchVector. The new version assign an ordered index to each ScanTask preserving the order in which they were

[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-04-30 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-622057148 @github-actions crossbow submit -g test This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] yordan-pavlov commented on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

2020-04-30 Thread GitBox
yordan-pavlov commented on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-621983970 Hi, I thought I would do some profiling yesterday (before packed_simd is removed) and noticed that a lot of time in `simd_compare_op` is spent in this loop here:

[GitHub] [arrow] kszucs opened a new pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
kszucs opened a new pull request #7074: URL: https://github.com/apache/arrow/pull/7074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-621967506 @github-actions crossbow submit wheel-win-* This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-621971057 https://issues.apache.org/jira/browse/ARROW-8656 This is an automated message from the Apache Git

[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-04-30 Thread GitBox
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418162946 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -111,23 +130,32 @@ public TBuilder AppendRange(IEnumerable values) public

[GitHub] [arrow] github-actions[bot] commented on pull request #7075: ARROW-8447: [C++] Ensure row deterministic ordering in Scanner::ToTable

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #7075: URL: https://github.com/apache/arrow/pull/7075#issuecomment-622034299 https://issues.apache.org/jira/browse/ARROW-8447 This is an automated message from the Apache Git

[GitHub] [arrow] bkietz commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-04-30 Thread GitBox
bkietz commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418193373 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -83,131 +83,67 @@ Result FileFragment::Scan(std::shared_ptr options

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622045514 @github-actions crossbow submit wheel-win-cp38 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622022639 @github-actions crossbow submit wheel-win-cp38 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622051056 Revision: 5a0c01cc93b5d4357cab19b27f9397e977a76277 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] bkietz commented on a change in pull request #7075: ARROW-8447: [C++] Ensure row deterministic ordering in Scanner::ToTable

2020-04-30 Thread GitBox
bkietz commented on a change in pull request #7075: URL: https://github.com/apache/arrow/pull/7075#discussion_r418216405 ## File path: cpp/src/arrow/dataset/scanner.cc ## @@ -165,23 +165,47 @@ std::shared_ptr ScanContext::TaskGroup() const { return TaskGroup::MakeSerial();

[GitHub] [arrow] fsaintjacques edited a comment on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-04-30 Thread GitBox
fsaintjacques edited a comment on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-621966208 > Shouldn't we require that? That seems the goal of UnionDataset to combine datasets with different formats Maybe, this is still enforced if you use the

[GitHub] [arrow] fsaintjacques commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-04-30 Thread GitBox
fsaintjacques commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-621966208 > > Fragments are not required to use the same > > backing filesystem nor the same format. > > Shouldn't we require that? That seems the goal of UnionDataset to

[GitHub] [arrow] vertexclique commented on a change in pull request #7064: ARROW-6945: [Rust] WIP: Add initial skeleton for Rust integration tests

2020-04-30 Thread GitBox
vertexclique commented on a change in pull request #7064: URL: https://github.com/apache/arrow/pull/7064#discussion_r418159133 ## File path: rust/arrow/Cargo.toml ## @@ -50,6 +50,7 @@ chrono = "0.4" flatbuffers = "0.6" hex = "0.4" arrow-flight = { path = "../arrow-flight",

[GitHub] [arrow] yordan-pavlov edited a comment on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

2020-04-30 Thread GitBox
yordan-pavlov edited a comment on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-621983970 Hi, I thought I would do some profiling yesterday (to help make sure packed_simd is not removed prematurely) and noticed that a lot of time in `simd_compare_op`

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-621968378 Revision: f64fd002135d7bb90cfb2725d01d3ccc73b809fa Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r417559116 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621356894 Revision: 719a93477453133902369cbad5a979a9eb8c4b04 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] lidavidm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
lidavidm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621396341 I replied to the ML thread with more context, hopefully that clears it up. This is an automated message from the

[GitHub] [arrow] bkietz commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-29 Thread GitBox
bkietz commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-621418575 Reverted everything except parse_options. Follow up: https://issues.apache.org/jira/browse/ARROW-8631 This is an

[GitHub] [arrow] bkietz opened a new pull request #7062: ARROW-8632: [C++] Fix conversion error warning in array_union_test.cc

2020-04-29 Thread GitBox
bkietz opened a new pull request #7062: URL: https://github.com/apache/arrow/pull/7062 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] lidavidm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
lidavidm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621359056 Just as a quick example, here's what can happen if you just try to read multiple files in parallel (the third graph): https://www.lidavidm.me/arrow/coalescing/vis.html

[GitHub] [arrow] vertexclique opened a new pull request #7061: ARROW-8629: [Rust] – Eliminate indirection of zero sized allocations

2020-04-29 Thread GitBox
vertexclique opened a new pull request #7061: URL: https://github.com/apache/arrow/pull/7061 1. Solves UB passing through OS. 2. Improves performance by removing indirections at intermediate array builds. (Increases throughput and decreases latency) Click to see Benchmark

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621347949 https://issues.apache.org/jira/browse/ARROW-8628 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7062: ARROW-8632: [C++] Fix conversion error warning in array_union_test.cc

2020-04-29 Thread GitBox
github-actions[bot] commented on pull request #7062: URL: https://github.com/apache/arrow/pull/7062#issuecomment-621424384 https://issues.apache.org/jira/browse/ARROW-8632 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621199652 @github-actions crossbow submit test-conda-python-3.7-pandas-latest test-debian-10-cpp test-conda-python-3.7 This

[GitHub] [arrow] kszucs commented on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621199563 @github-actions crossbow submit test-conda-python-3.7-pandas-latest test-debian-10-cpp test-conda-python-3.7 This

[GitHub] [arrow] kszucs removed a comment on pull request #7021: Wrap docker-compose commands with archery

2020-04-29 Thread GitBox
kszucs removed a comment on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621199563 @github-actions crossbow submit test-conda-python-3.7-pandas-latest test-debian-10-cpp test-conda-python-3.7

[GitHub] [arrow] bkietz commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

2020-04-29 Thread GitBox
bkietz commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-621229069 @jorisvandenbossche > do we plan to use the CsvFileFormat also for writing? ... For Parquet there is a ReaderOptions that grouped all options related to reading in the

[GitHub] [arrow] bkietz commented on a change in pull request #7026: ARROW-7391: [C++][Dataset] Remove Expression subclasses from bindings

2020-04-29 Thread GitBox
bkietz commented on a change in pull request #7026: URL: https://github.com/apache/arrow/pull/7026#discussion_r417341608 ## File path: cpp/src/arrow/scalar.cc ## @@ -252,6 +270,100 @@ Result> Scalar::Parse(const std::shared_ptr& t return ScalarParseImpl{type, s}.Finish();

[GitHub] [arrow] wesm edited a comment on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
wesm edited a comment on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621266083 Yes, we should discuss on the mailing list. EDIT: we do have a separate thread pool for IO, but it's limited to 8 threads. Eventually absent a path forward on sane

[GitHub] [arrow] pitrou commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-29 Thread GitBox
pitrou commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-621303158 However, beware that, the more remote IO requests you issue in parallel, the longer they'll take individually given fixed bandwidth limits.

[GitHub] [arrow] fsaintjacques commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-04-30 Thread GitBox
fsaintjacques commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622092322 Due to R failure (that I didn't catch because my installation was broken and using and old version of arrow), I'll revert the FileSystemDataset::format and make sure

[GitHub] [arrow] kou commented on a change in pull request #7067: ARROW-8639: [C++][Plasma] Require gflags

2020-04-30 Thread GitBox
kou commented on a change in pull request #7067: URL: https://github.com/apache/arrow/pull/7067#discussion_r418272894 ## File path: cpp/cmake_modules/FindgflagsAlt.cmake ## @@ -15,6 +15,8 @@ # specific language governing permissions and limitations # under the License. +#

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-622062939 Revision: 3cd96ea48a2116322b2fec06207fb1d624e0f969 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] pauldix commented on pull request #7064: ARROW-6945: [Rust] WIP: Add initial skeleton for Rust integration tests

2020-04-30 Thread GitBox
pauldix commented on pull request #7064: URL: https://github.com/apache/arrow/pull/7064#issuecomment-622136810 @nealrichardson are there individual tests for each of the rows in that sheet? Or are many covered with each integration scenario? @andygrove I think if you can PR into my

[GitHub] [arrow] nealrichardson commented on pull request #7064: ARROW-6945: [Rust] WIP: Add initial skeleton for Rust integration tests

2020-04-30 Thread GitBox
nealrichardson commented on pull request #7064: URL: https://github.com/apache/arrow/pull/7064#issuecomment-622139195 Column I in the first sheet shows which test (generated) files cover which types. So many are covered in a single "test", in that the test JSON it produces includes many

[GitHub] [arrow] nealrichardson commented on pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-30 Thread GitBox
nealrichardson commented on pull request #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-622177076 So `rust/arrow-flight/src/arrow.flight.protocol.rs` is generated from `format/Flight.proto`? There is precedent for adding generated files to `rat_excluded_files.txt`, so

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-04-30 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622083097 @github-actions crossbow submit wheel-win-cp38 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] andygrove commented on a change in pull request #7035: ARROW-8590: [Rust] Use arrow crate pretty util in DataFusion

2020-04-30 Thread GitBox
andygrove commented on a change in pull request #7035: URL: https://github.com/apache/arrow/pull/7035#discussion_r418346377 ## File path: rust/arrow/src/util/pretty.rs ## @@ -27,18 +27,18 @@ use prettytable::{Cell, Row, Table}; use crate::error::{ArrowError, Result}; ///!

[GitHub] [arrow] tustvold opened a new pull request #7076: ARROW-8659: [Rust] ListBuilder allocate with_capacity

2020-04-30 Thread GitBox
tustvold opened a new pull request #7076: URL: https://github.com/apache/arrow/pull/7076 Both ListBuilder and FixedSizeListBuilder accept a values_builder as a constructor argument and then set the capacity of their internal builders based off the length of this values_builder.

[GitHub] [arrow] andygrove commented on pull request #7018: ARROW-8536: [Rust] [Flight] Check in proto file, conditional build if file exists

2020-04-30 Thread GitBox
andygrove commented on pull request #7018: URL: https://github.com/apache/arrow/pull/7018#issuecomment-622174765 @nealrichardson Maybe you have an opinion on this? This is the issue I mentioned on the sync call. This is an

[GitHub] [arrow] github-actions[bot] commented on pull request #7076: ARROW-8659: [Rust] ListBuilder allocate with_capacity

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #7076: URL: https://github.com/apache/arrow/pull/7076#issuecomment-622178164 https://issues.apache.org/jira/browse/ARROW-8659 This is an automated message from the Apache Git

<    2   3   4   5   6   7   8   9   10   11   >