[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-622320430 @github-actions crossbow submit -g test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622345879 @github-actions crossbow submit wheel-win-cp38 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zgramana commented on pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
zgramana commented on pull request #7032: URL: https://github.com/apache/arrow/pull/7032#issuecomment-620266004 @eerhardt thanks for the excellent review and feedback. I believe that I have pushed fixes for everything. I also noticed I had missed validity buffer state management in `PrimitiveArrayBuilder`'s `Set`, `Swap` and `Clear` methods. I've added those in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r416235260 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/file_csv.h" + +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/csv/reader.h" +#include "arrow/dataset/dataset_internal.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/result.h" +#include "arrow/util/iterator.h" + +namespace arrow { +namespace dataset { + +using internal::checked_cast; +using internal::checked_pointer_cast; + +static inline csv::ReadOptions default_read_options() { + auto defaults = csv::ReadOptions::Defaults(); + defaults.use_threads = false; + return defaults; +} + +static inline Result> OpenReader( +const FileSource& source, const CsvFileFormat& format) { + ARROW_ASSIGN_OR_RAISE(auto input, source.Open()); + + auto maybe_reader = csv::StreamingReader::Make( + default_memory_pool(), std::move(input), default_read_options(), + format.parse_options, format.convert_options); + if (!maybe_reader.ok()) { +return maybe_reader.status().WithMessage("Could not open IPC input source '", + source.path(), "': ", maybe_reader.status()); + } + + return std::move(maybe_reader).ValueOrDie(); +} + +/// \brief A ScanTask backed by an Csv file. +class CsvScanTask : public ScanTask { + public: + CsvScanTask(std::shared_ptr format, FileSource source, + std::shared_ptr options, std::shared_ptr context) + : ScanTask(std::move(options), std::move(context)), +format_(std::move(format)), +source_(std::move(source)) {} + + Result Execute() override { +ARROW_ASSIGN_OR_RAISE(auto reader, OpenReader(source_, *format_)); Review comment: Reading using a predetermined schema is already supported for CSV and for all the other formats. The first chunk (as determined by `CsvFileFormat::chunk_size`) is used to sniff the schema which is assumed for the rest of the file, and this inspection can be applied to just one or to all the files in a dataset at discovery time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r416253903 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/file_csv.h" + +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/csv/reader.h" +#include "arrow/dataset/dataset_internal.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/result.h" +#include "arrow/util/iterator.h" + +namespace arrow { +namespace dataset { + +using internal::checked_cast; +using internal::checked_pointer_cast; + +static inline csv::ReadOptions default_read_options() { + auto defaults = csv::ReadOptions::Defaults(); + defaults.use_threads = false; + return defaults; +} + +static inline Result> OpenReader( +const FileSource& source, const CsvFileFormat& format) { + ARROW_ASSIGN_OR_RAISE(auto input, source.Open()); + + auto maybe_reader = csv::StreamingReader::Make( + default_memory_pool(), std::move(input), default_read_options(), + format.parse_options, format.convert_options); + if (!maybe_reader.ok()) { +return maybe_reader.status().WithMessage("Could not open IPC input source '", + source.path(), "': ", maybe_reader.status()); + } + + return std::move(maybe_reader).ValueOrDie(); +} + +/// \brief A ScanTask backed by an Csv file. +class CsvScanTask : public ScanTask { + public: + CsvScanTask(std::shared_ptr format, FileSource source, + std::shared_ptr options, std::shared_ptr context) + : ScanTask(std::move(options), std::move(context)), +format_(std::move(format)), +source_(std::move(source)) {} + + Result Execute() override { +ARROW_ASSIGN_OR_RAISE(auto reader, OpenReader(source_, *format_)); Review comment: I'll add a failing test tomorrow so this can become less hand wavy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7048: ARROW-8609: [C++] fix orc jni crash
github-actions[bot] commented on pull request #7048: URL: https://github.com/apache/arrow/pull/7048#issuecomment-620323735 https://issues.apache.org/jira/browse/ARROW-8609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r416239202 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/file_csv.h" + +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/csv/reader.h" +#include "arrow/dataset/dataset_internal.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/result.h" +#include "arrow/util/iterator.h" + +namespace arrow { +namespace dataset { + +using internal::checked_cast; +using internal::checked_pointer_cast; + +static inline csv::ReadOptions default_read_options() { + auto defaults = csv::ReadOptions::Defaults(); + defaults.use_threads = false; + return defaults; +} + +static inline Result> OpenReader( +const FileSource& source, const CsvFileFormat& format) { + ARROW_ASSIGN_OR_RAISE(auto input, source.Open()); + + auto maybe_reader = csv::StreamingReader::Make( + default_memory_pool(), std::move(input), default_read_options(), + format.parse_options, format.convert_options); + if (!maybe_reader.ok()) { +return maybe_reader.status().WithMessage("Could not open IPC input source '", + source.path(), "': ", maybe_reader.status()); + } + + return std::move(maybe_reader).ValueOrDie(); +} + +/// \brief A ScanTask backed by an Csv file. +class CsvScanTask : public ScanTask { + public: + CsvScanTask(std::shared_ptr format, FileSource source, + std::shared_ptr options, std::shared_ptr context) + : ScanTask(std::move(options), std::move(context)), +format_(std::move(format)), +source_(std::move(source)) {} + + Result Execute() override { +ARROW_ASSIGN_OR_RAISE(auto reader, OpenReader(source_, *format_)); Review comment: The edge case here is due to the way we pass only the projected schema during a scan, so maybe the solution is to always pass the full dataset schema as well so that no materialized fields will be inferring at scan time This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zhouyuan opened a new pull request #7048: ARROW-8609: [C++] fix orc jni crash
zhouyuan opened a new pull request #7048: URL: https://github.com/apache/arrow/pull/7048 check if arrow buffer is null before passing to the constructor Signed-off-by: Yuan Zhou This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zhouyuan commented on pull request #6953: [Java] Reproduce ORC JNI binding crash [WIP]
zhouyuan commented on pull request #6953: URL: https://github.com/apache/arrow/pull/6953#issuecomment-620321153 @kszucs ran into similar issue on parquet side, not sure if this #7048 works This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs opened a new pull request #7081: [CI] Cache docker volumes [WIP]
kszucs opened a new pull request #7081: URL: https://github.com/apache/arrow/pull/7081 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418424144 ## File path: csharp/src/Apache.Arrow/Arrays/ListArray.cs ## @@ -135,6 +152,11 @@ public int GetValueOffset(int index) public int GetValueLength(int index) { +if (IsNull(index)) Review comment: Fixed #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418393347 ## File path: csharp/src/Apache.Arrow/Arrays/ListArray.cs ## @@ -135,6 +152,11 @@ public int GetValueOffset(int index) public int GetValueLength(int index) { +if (IsNull(index)) Review comment: This should be after the argument checking. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418383314 ## File path: csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs ## @@ -162,8 +188,8 @@ public TBuilder Swap(int i, int j) public TArray Build(MemoryAllocator allocator = default) { return Build( -ValueBuffer.Build(allocator), ArrowBuffer.Empty, -ValueBuffer.Length, 0, 0); +ValueBuffer.Build(allocator), ValidityBuffer.Build(allocator).ValueBuffer, Review comment: As noted on another comment, I've pushed up this refactor. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418380212 ## File path: csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs ## @@ -99,55 +105,75 @@ public abstract class PrimitiveArrayBuilder : IArrowArrayBu { protected TBuilder Instance => this as TBuilder; protected ArrowBuffer.Builder ValueBuffer { get; } +protected BooleanArray.Builder ValidityBuffer { get; } public int Length => ValueBuffer.Length; - -// TODO: Implement support for null values (null bitmaps) +protected int NullCount { get; set; } internal PrimitiveArrayBuilder() { ValueBuffer = new ArrowBuffer.Builder(); +ValidityBuffer = new BooleanArray.Builder(); } public TBuilder Resize(int length) { ValueBuffer.Resize(length); +ValidityBuffer.Resize(length + 1); Review comment: Copy/paste error from `BinaryArray`. Good catch. Fixed. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r417709163 ## File path: csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs ## @@ -99,55 +105,75 @@ public abstract class PrimitiveArrayBuilder : IArrowArrayBu { protected TBuilder Instance => this as TBuilder; protected ArrowBuffer.Builder ValueBuffer { get; } +protected BooleanArray.Builder ValidityBuffer { get; } public int Length => ValueBuffer.Length; - -// TODO: Implement support for null values (null bitmaps) +protected int NullCount { get; set; } internal PrimitiveArrayBuilder() { ValueBuffer = new ArrowBuffer.Builder(); +ValidityBuffer = new BooleanArray.Builder(); } public TBuilder Resize(int length) { ValueBuffer.Resize(length); +ValidityBuffer.Resize(length + 1); Review comment: Why the `+ 1` here? Shouldn't the ValidityBuffer and ValueBuffer be the same size? (Same question for Reserve below). #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
bkietz commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622364855 WRT convenient single format or single file system datasets, it would be straightforward (and possibly more useful) to provide accessories for subsets, Dataset.subsets(by_format=True): { format: subset ... } Or similar. IMHO the convenience added does not justify a whole class, at least in C++. As you noted even if we remove it in C++ it could still be a convenience subclass in Python This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622371728 What I've seen other projects do (have to dig for some examples) is to have files like ``` functionality_nosimd.cc functionality_sse42.cc functionality_avx2.cc ``` The appropriate compiler flags are set on the respective compilation unit. Then query the CPU at runtime to determine its features and choose which variant to invoke. @cyb70289 expressed an interest in this recently, I think This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files
cyb70289 commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r418558120 ## File path: cpp/src/arrow/util/simd.h ## @@ -17,6 +17,24 @@ #pragma once +#ifdef _MSC_VER +// MSVC x86_64/arm64 + +#if defined(_M_AMD64) || defined(_M_X64) +#include +#elif defined(_M_ARM64) +#include +#endif + +#else +// gcc/clang (possibly others) + Review comment: yes, I think so This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622354887 Do we need FileSystemDataset, maybe not. Is it still useful, IMO yes. As mentioned above, I personally find it convenient to know that my dataset has a single format / filesystem, and be able to easily check this format. Now of course, it might be that this convenience is not worth it given the added complexity to the code. Or that such convenience could also be given by the wrapper languages (eg in Python we could still have a Dataset subclass for single formats) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622356800 @github-actions crossbow submit wheel-win-* This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622357099 Revision: 8852e2f5f32402ca9c85877289c7948db141cca7 Submitted crossbow builds: [ursa-labs/crossbow @ actions-194](https://github.com/ursa-labs/crossbow/branches/all?query=actions-194) |Task|Status| ||--| |wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-194-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| |wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-194-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| |wheel-win-cp37m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-194-appveyor-wheel-win-cp37m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| |wheel-win-cp38|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-194-appveyor-wheel-win-cp38.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622373323 Take a look at how this is currently being handled in NumPy * https://numpy.org/neps/nep-0038-SIMD-optimizations.html * https://github.com/numpy/numpy/pull/13516 I assume a lot of thought was put into this, and since it was done in 2019 it should provide a good guideline for what we should do This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r417708201 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -73,24 +76,34 @@ public TArray Build(MemoryAllocator allocator = default) { ValueOffsets.Append(Offset); -var data = new ArrayData(DataType, ValueOffsets.Length - 1, 0, 0, -new[] { ArrowBuffer.Empty, ValueOffsets.Build(allocator), ValueBuffer.Build(allocator) }); +var data = new ArrayData(DataType, ValueOffsets.Length - 1, NullCount, 0, +new[] { ValidityBuffer.Build(allocator).ValueBuffer, ValueOffsets.Build(allocator), ValueBuffer.Build(allocator) }); Review comment: Were we going to check for `NullCount == 0` and use an empty buffer for the validity buffer in that case? The same goes for `PrimitiveArrayBuilder` and `BoleanArray.Builder`. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r415075754 ## File path: csharp/src/Apache.Arrow/Arrays/ArrayData.cs ## @@ -84,7 +84,7 @@ public ArrayData Slice(int offset, int length) length = Math.Min(Length - offset, length); offset += Offset; -return new ArrayData(DataType, length, -1, offset, Buffers, Children); +return new ArrayData(DataType, length, NullCount, offset, Buffers, Children); Review comment: This isn't correct because we are slicing a subset of the `ArrayData`. If the total `ArrayData` has values: `0`, `null`, `2`, `3`, `4` `NullCount` for the outer `ArrayData` will be `1`. But if we are slicing from `offset = 2` and `length = 3` and creating a new `ArrayData` from the above, the new `ArrayData` won't have any nulls in it. But with this change, it will have `NullCount = 1` from its parent. `NullCount = -1` in the original code means "we don't know yet, it needs to be computed". That was done so we don't have to count the nulls in the sliced values until it is necessary. Since there are cases where it may not be required. #Closed ## File path: csharp/src/Apache.Arrow/Arrays/ArrayData.cs ## @@ -22,6 +22,8 @@ namespace Apache.Arrow { public sealed class ArrayData : IDisposable { +public static readonly int RecalculateNullCount = -1; Review comment: Can we leave this `private` for now? We can always make it public later. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r416049921 ## File path: csharp/src/Apache.Arrow/Arrays/ArrayData.cs ## @@ -84,7 +84,7 @@ public ArrayData Slice(int offset, int length) length = Math.Min(Length - offset, length); offset += Offset; -return new ArrayData(DataType, length, -1, offset, Buffers, Children); +return new ArrayData(DataType, length, NullCount, offset, Buffers, Children); Review comment: Appears obvious in retrospect. I reverted, though I did introduce a constant for enhanced readability. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418423816 ## File path: csharp/src/Apache.Arrow/Arrays/ArrayData.cs ## @@ -22,6 +22,8 @@ namespace Apache.Arrow { public sealed class ArrayData : IDisposable { +public static readonly int RecalculateNullCount = -1; Review comment: Changed #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418385072 ## File path: csharp/src/Apache.Arrow/Arrays/StringArray.cs ## @@ -71,6 +76,15 @@ public string GetString(int index, Encoding encoding = default) var bytes = GetBytes(index); +if (bytes == Span.Empty) +{ +return null; +} +if (bytes.Length == 0) Review comment: Yes. See `ArrayBuilderTests.StringArrayBuilderHandlesNullsAndEmptyStrings`. #Closed ## File path: csharp/src/Apache.Arrow/Arrays/StringArray.cs ## @@ -71,6 +76,15 @@ public string GetString(int index, Encoding encoding = default) var bytes = GetBytes(index); +if (bytes == Span.Empty) Review comment: I chose the first option as it avoids any public API changes. In general I am not a fan of `out` contracts, but did not have a better alternative. This way there is still time to contemplate alternatives before committing to the proposed overload. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r417711102 ## File path: csharp/src/Apache.Arrow/Arrays/StringArray.cs ## @@ -71,6 +76,15 @@ public string GetString(int index, Encoding encoding = default) var bytes = GetBytes(index); +if (bytes == Span.Empty) +{ +return null; +} +if (bytes.Length == 0) Review comment: Do we have unit tests for both of these cases to make sure they act appropriately? A test for `null` and a test for `string.Empty`. #Closed ## File path: csharp/src/Apache.Arrow/Arrays/StringArray.cs ## @@ -71,6 +76,15 @@ public string GetString(int index, Encoding encoding = default) var bytes = GetBytes(index); +if (bytes == Span.Empty) Review comment: We had similar conversations about this in the `DataFrame` type we made here: https://github.com/dotnet/corefxlab/pull/2710#issuecomment-521799900. I think it makes more sense to say `if (bytes == default)` here. Or another idea is to add a `ReadOnlySpan GetBytes(int index, out bool isNull)` method to `BinaryArray`, which would allow callers know explicitly if the value is `null` without having a double `IsValid` check. #Closed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622397441 Revision: b65130bd5eae0e6fe79ace9d529a57f76869f621 Submitted crossbow builds: [ursa-labs/crossbow @ actions-195](https://github.com/ursa-labs/crossbow/branches/all?query=actions-195) |Task|Status| ||--| |wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-195-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| |wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-195-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| |wheel-win-cp37m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-195-appveyor-wheel-win-cp37m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| |wheel-win-cp38|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-195-appveyor-wheel-win-cp38.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622396964 @github-actions crossbow submit wheel-win-* This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418554202 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -83,131 +83,67 @@ Result FileFragment::Scan(std::shared_ptr options FileSystemDataset::FileSystemDataset(std::shared_ptr schema, std::shared_ptr root_partition, - std::shared_ptr format, - std::shared_ptr filesystem, - fs::PathForest forest, - ExpressionVector file_partitions) + std::vector> fragments) : Dataset(std::move(schema), std::move(root_partition)), - format_(std::move(format)), - filesystem_(std::move(filesystem)), - forest_(std::move(forest)), - partitions_(std::move(file_partitions)) { - DCHECK_EQ(static_cast(forest_.size()), partitions_.size()); -} - -Result> FileSystemDataset::Make( -std::shared_ptr schema, std::shared_ptr root_partition, -std::shared_ptr format, std::shared_ptr filesystem, -std::vector infos) { - ExpressionVector partitions(infos.size(), scalar(true)); - return Make(std::move(schema), std::move(root_partition), std::move(format), - std::move(filesystem), std::move(infos), std::move(partitions)); -} + fragments_(std::move(fragments)) {} Result> FileSystemDataset::Make( std::shared_ptr schema, std::shared_ptr root_partition, -std::shared_ptr format, std::shared_ptr filesystem, -std::vector infos, ExpressionVector partitions) { - ARROW_ASSIGN_OR_RAISE(auto forest, fs::PathForest::Make(std::move(infos), )); - return Make(std::move(schema), std::move(root_partition), std::move(format), - std::move(filesystem), std::move(forest), std::move(partitions)); -} +FragmentVector fragments) { + std::vector> file_fragments; + for (const auto& fragment : fragments) { +auto file_fragment = internal::checked_pointer_cast(fragment); Review comment: derp, I actually forgot to check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418556583 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -221,42 +157,34 @@ Result> FileSystemDataset::Write( filesystem = std::make_shared(); } - std::vector files(plan.paths.size()); - ExpressionVector partition_expressions(plan.paths.size(), scalar(true)); auto task_group = scan_context->TaskGroup(); - auto partition_base_dir = fs::internal::EnsureTrailingSlash(plan.partition_base_dir); auto extension = "." + plan.format->type_name(); + FragmentVector fragments; for (size_t i = 0; i < plan.paths.size(); ++i) { const auto& op = plan.fragment_or_partition_expressions[i]; -if (util::holds_alternative>(op)) { - files[i].set_type(fs::FileType::Directory); - files[i].set_path(partition_base_dir + plan.paths[i]); +if (util::holds_alternative>(op)) { Review comment: This will be part of ARROW-8382. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418556583 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -221,42 +157,34 @@ Result> FileSystemDataset::Write( filesystem = std::make_shared(); } - std::vector files(plan.paths.size()); - ExpressionVector partition_expressions(plan.paths.size(), scalar(true)); auto task_group = scan_context->TaskGroup(); - auto partition_base_dir = fs::internal::EnsureTrailingSlash(plan.partition_base_dir); auto extension = "." + plan.format->type_name(); + FragmentVector fragments; for (size_t i = 0; i < plan.paths.size(); ++i) { const auto& op = plan.fragment_or_partition_expressions[i]; -if (util::holds_alternative>(op)) { - files[i].set_type(fs::FileType::Directory); - files[i].set_path(partition_base_dir + plan.paths[i]); +if (util::holds_alternative>(op)) { Review comment: This will be done part of ARROW-8382. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7081: [CI] Cache docker volumes [WIP]
github-actions[bot] commented on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622378960 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] hantusk opened a new issue #7082: pyarrow 0.17 atexit handler causes a segmentation fault
hantusk opened a new issue #7082: URL: https://github.com/apache/arrow/issues/7082 When running an ASGI webapp in python with uvicorn, I am getting the following error when shutting down. Solved by reverting back to pyarrow 0.16.0 ```python Error in atexit._run_exitfuncs: Traceback (most recent call last): File "pyarrow/types.pxi", line 2638, in pyarrow.lib._unregister_py_extension_types SystemError: Bad call flags in _PyMethodDef_RawFastCallDict. METH_OLDARGS is no longer supported! [1]23073 segmentation fault uvicorn ba... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7081: [CI] Cache docker volumes [WIP]
kszucs commented on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622417613 With warmed up cache the build time has been reduced to 6m from 17m which is promising. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7080: ARROW-8662: [CI] Consolidate appveyor scripts
github-actions[bot] commented on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622423422 https://issues.apache.org/jira/browse/ARROW-8662 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7080: [CI] Consolidate appveyor scripts [WIP]
kszucs commented on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622408594 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
kszucs commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622410968 I'd like elaborate a bit more on the generic dataset class regardless what kind of wrappers do we provide. - Do you plan to unify the filesystem classes into a single one which can contain any kind of fragments (including InMemoryFragment)? - Will the UnionDataset still be required? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7080: [CI] Consolidate appveyor scripts [WIP]
kszucs commented on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622418446 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs removed a comment on pull request #7081: [CI] Cache docker volumes [WIP]
kszucs removed a comment on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622418381 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] crd477 opened a new pull request #7083: Update building.rst
crd477 opened a new pull request #7083: URL: https://github.com/apache/arrow/pull/7083 simple typo: not -> note This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7083: ARROW-8663: [Documentation] Small correction to building.rst
github-actions[bot] commented on pull request #7083: URL: https://github.com/apache/arrow/pull/7083#issuecomment-622435457 https://issues.apache.org/jira/browse/ARROW-8663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery
fsaintjacques commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r418596433 ## File path: .github/workflows/archery.yml ## @@ -51,10 +53,12 @@ jobs: python-version: '3.7' - name: Install working-directory: dev/archery -run: pip install pytest responses ruamel.yaml toolz jinja2 -e . +run: pip install pytest python-dotenv responses ruamel.yaml toolz jinja2 -e . Review comment: Why not use pip install -e to ensure that requirements are up to date? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622409707 The 32-bit R failure seems like it could be real This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418564299 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -83,131 +83,67 @@ Result FileFragment::Scan(std::shared_ptr options FileSystemDataset::FileSystemDataset(std::shared_ptr schema, std::shared_ptr root_partition, - std::shared_ptr format, - std::shared_ptr filesystem, - fs::PathForest forest, - ExpressionVector file_partitions) + std::vector> fragments) : Dataset(std::move(schema), std::move(root_partition)), - format_(std::move(format)), - filesystem_(std::move(filesystem)), - forest_(std::move(forest)), - partitions_(std::move(file_partitions)) { - DCHECK_EQ(static_cast(forest_.size()), partitions_.size()); -} - -Result> FileSystemDataset::Make( -std::shared_ptr schema, std::shared_ptr root_partition, -std::shared_ptr format, std::shared_ptr filesystem, -std::vector infos) { - ExpressionVector partitions(infos.size(), scalar(true)); - return Make(std::move(schema), std::move(root_partition), std::move(format), - std::move(filesystem), std::move(infos), std::move(partitions)); -} + fragments_(std::move(fragments)) {} Result> FileSystemDataset::Make( std::shared_ptr schema, std::shared_ptr root_partition, -std::shared_ptr format, std::shared_ptr filesystem, -std::vector infos, ExpressionVector partitions) { - ARROW_ASSIGN_OR_RAISE(auto forest, fs::PathForest::Make(std::move(infos), )); - return Make(std::move(schema), std::move(root_partition), std::move(format), - std::move(filesystem), std::move(forest), std::move(partitions)); -} +FragmentVector fragments) { + std::vector> file_fragments; + for (const auto& fragment : fragments) { +auto file_fragment = internal::checked_pointer_cast(fragment); Review comment: Resolved by changing the to vector and moving the checks at callsite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs removed a comment on pull request #7080: [CI] Consolidate appveyor scripts [WIP]
kszucs removed a comment on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622408594 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7081: [CI] Cache docker volumes [WIP]
kszucs commented on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622418381 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622422074 @nealrichardson i'm not sure what the error is saying? also 32-bit R didn't realize that is still a thing :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs edited a comment on pull request #7081: [CI] Cache docker volumes [WIP]
kszucs edited a comment on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622417613 With warmed up cache the build time has been reduced to 6m from 17m which is promising. I'll need to do some gymnastics with the cache keys because the cache plugin lacks some features, but it'll work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format
nealrichardson commented on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622437524 @github-actions autotune everything This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format
wesm commented on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622447876 I can pick up this patch today and take it the last mile so it can be merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #6303: ARROW-8039: [Python] Use dataset API in existing parquet readers and tests
jorisvandenbossche commented on pull request #6303: URL: https://github.com/apache/arrow/pull/6303#issuecomment-621937113 I finally listed the open TODO items from the discussions in this PR / the skipped tests, and opened JIRAs where this was not yet the case: - Deduplicating the specified column names (https://github.com/apache/arrow/pull/6303#discussion_r397350410): do we actually want this? - Support buffers/NativeFile as file source -> https://issues.apache.org/jira/browse/ARROW-8074 - Multithreaded discovery -> https://issues.apache.org/jira/browse/ARROW-8137 - Deterministic row order -> https://issues.apache.org/jira/browse/ARROW-8447 - Error on "bad" files instead of skipping -> https://issues.apache.org/jira/browse/ARROW-7673 - Partition fields using dictionary type -> opened https://issues.apache.org/jira/browse/ARROW-8647 (but should this be on by default?) - Metadata support: https://issues.apache.org/jira/browse/ARROW-8062 is an issue about discovery from metadata files, but in addition we also need a way to expose the metadata/statistics - Support pickling -> opened https://issues.apache.org/jira/browse/ARROW-8651 - Comment about needing better error message when encountering invalid files: this seems to work now, opened https://issues.apache.org/jira/browse/ARROW-8652 to enable the test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7060: ARROW-8619: [C++] Use distinct enum values for MonthInterval, DayTimeInterval
wesm commented on pull request #7060: URL: https://github.com/apache/arrow/pull/7060#issuecomment-621933237 For some reason I can't get JNI running in my local setup ``` CMake Error at /home/wesm/cpp-toolchain/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message): Could NOT find JNI (missing: JAVA_AWT_INCLUDE_PATH) Call Stack (most recent call first): /home/wesm/cpp-toolchain/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393 (_FPHSA_FAILURE_MESSAGE) /home/wesm/cpp-toolchain/share/cmake-3.16/Modules/FindJNI.cmake:372 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) src/gandiva/jni/CMakeLists.txt:27 (find_package) ``` tried with ``` export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 ``` and ``` export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 ``` It worked with ``` export JAVA_HOME=/usr/lib/jvm/java-14-oracle ``` though This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7067: ARROW-8639: [C++][Plasma] Require gflags
kszucs commented on a change in pull request #7067: URL: https://github.com/apache/arrow/pull/7067#discussion_r418109900 ## File path: cpp/cmake_modules/FindgflagsAlt.cmake ## @@ -15,6 +15,8 @@ # specific language governing permissions and limitations # under the License. +# TODO: Support version detection. Review comment: Created a follow-up ticket https://issues.apache.org/jira/browse/ARROW-8653 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7060: ARROW-8619: [C++] Use distinct enum values for MonthInterval, DayTimeInterval
wesm commented on pull request #7060: URL: https://github.com/apache/arrow/pull/7060#issuecomment-621872082 Will fix the JNI issue and will notify the mailing list This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r418108005 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,136 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/file_csv.h" + +#include +#include +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/csv/reader.h" +#include "arrow/dataset/dataset_internal.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/filter.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/result.h" +#include "arrow/type.h" +#include "arrow/util/iterator.h" + +namespace arrow { +namespace dataset { + +using internal::checked_cast; +using internal::checked_pointer_cast; + +static inline Result GetConvertOptions( +const CsvFileFormat& format, const std::shared_ptr& scan_options) { + auto options = csv::ConvertOptions::Defaults(); + if (scan_options != nullptr) { +// This is set to true to match behavior with other formats; a missing column +// will be materialized as null. +options.include_missing_columns = true; + +for (const auto& field : scan_options->schema()->fields()) { + options.column_types[field->name()] = field->type(); + options.include_columns.push_back(field->name()); +} + +// FIXME(bkietz) also acquire types of fields materialized but not projected. +for (auto&& name : FieldsInExpression(scan_options->filter)) { + ARROW_ASSIGN_OR_RAISE(auto match, + FieldRef(name).FindOneOrNone(*scan_options->schema())); + if (match.indices().empty()) { +options.include_columns.push_back(std::move(name)); + } +} + } + return options; +} + +static inline csv::ReadOptions GetReadOptions(const CsvFileFormat& format) { + auto options = csv::ReadOptions::Defaults(); + // Multithreaded conversion of individual files would lead to excessive thread + // contention when ScanTasks are also executed in multiple threads, so we disable it + // here. + options.use_threads = false; + return options; +} + +static inline Result> OpenReader( +const FileSource& source, const CsvFileFormat& format, +const std::shared_ptr& options = nullptr, Review comment: Not sure what you mean This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621957531 @github-actions crossbow submit test-debian-10-cpp This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621884756 Revision: 5c0b02dd7947e1e61da701169cc5fafb9135a6e5 Submitted crossbow builds: [ursa-labs/crossbow @ actions-183](https://github.com/ursa-labs/crossbow/branches/all?query=actions-183) |Task|Status| ||--| |test-conda-python-3.7|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-183-github-test-conda-python-3.7)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-183-github-test-conda-python-3.7)| |test-debian-10-cpp|[![CircleCI](https://img.shields.io/circleci/build/github/ursa-labs/crossbow/actions-183-circle-test-debian-10-cpp.svg)](https://circleci.com/gh/ursa-labs/crossbow/tree/actions-183-circle-test-debian-10-cpp)| |test-debian-10-go-1.12|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-183-azure-test-debian-10-go-1.12)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-183-azure-test-debian-10-go-1.12)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621883711 @github-actions crossbow submit test-debian-10-cpp test-debian-10-go-1.12 test-conda-python-3.7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat
bkietz commented on pull request #7033: URL: https://github.com/apache/arrow/pull/7033#issuecomment-621814564 I'm happy to implement whatever configuration is agreeable. I'll add a list of the approaches which have been discussed here to the follow-up so we can discuss them there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] markhildreth opened a new pull request #7072: ARROW-8648: [Rust] Optimize Rust CI Workflows
markhildreth opened a new pull request #7072: URL: https://github.com/apache/arrow/pull/7072 WIP This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7072: ARROW-8648: [Rust] Optimize Rust CI Workflows
github-actions[bot] commented on pull request #7072: URL: https://github.com/apache/arrow/pull/7072#issuecomment-621910533 https://issues.apache.org/jira/browse/ARROW-8648 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat
bkietz commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r418109100 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,136 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/file_csv.h" + +#include +#include +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/csv/reader.h" +#include "arrow/dataset/dataset_internal.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/filter.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/result.h" +#include "arrow/type.h" +#include "arrow/util/iterator.h" + +namespace arrow { +namespace dataset { + +using internal::checked_cast; +using internal::checked_pointer_cast; + +static inline Result GetConvertOptions( +const CsvFileFormat& format, const std::shared_ptr& scan_options) { + auto options = csv::ConvertOptions::Defaults(); + if (scan_options != nullptr) { +// This is set to true to match behavior with other formats; a missing column +// will be materialized as null. +options.include_missing_columns = true; + +for (const auto& field : scan_options->schema()->fields()) { + options.column_types[field->name()] = field->type(); + options.include_columns.push_back(field->name()); +} + +// FIXME(bkietz) also acquire types of fields materialized but not projected. +for (auto&& name : FieldsInExpression(scan_options->filter)) { + ARROW_ASSIGN_OR_RAISE(auto match, + FieldRef(name).FindOneOrNone(*scan_options->schema())); + if (match.indices().empty()) { +options.include_columns.push_back(std::move(name)); + } +} + } + return options; +} + +static inline csv::ReadOptions GetReadOptions(const CsvFileFormat& format) { + auto options = csv::ReadOptions::Defaults(); + // Multithreaded conversion of individual files would lead to excessive thread + // contention when ScanTasks are also executed in multiple threads, so we disable it + // here. + options.use_threads = false; + return options; +} + +static inline Result> OpenReader( +const FileSource& source, const CsvFileFormat& format, +const std::shared_ptr& options = nullptr, +MemoryPool* pool = default_memory_pool()) { + ARROW_ASSIGN_OR_RAISE(auto input, source.Open()); + + auto reader_options = GetReadOptions(format); + const auto& parse_options = format.parse_options; + ARROW_ASSIGN_OR_RAISE(auto convert_options, GetConvertOptions(format, options)); + auto maybe_reader = csv::StreamingReader::Make(pool, std::move(input), reader_options, + parse_options, convert_options); + if (!maybe_reader.ok()) { +return maybe_reader.status().WithMessage("Could not open CSV input source '", + source.path(), "': ", maybe_reader.status()); + } + + return std::move(maybe_reader).ValueOrDie(); +} + +/// \brief A ScanTask backed by an Csv file. +class CsvScanTask : public ScanTask { + public: + CsvScanTask(std::shared_ptr format, FileSource source, + std::shared_ptr options, std::shared_ptr context) + : ScanTask(std::move(options), std::move(context)), +format_(std::move(format)), +source_(std::move(source)) {} + + Result Execute() override { +ARROW_ASSIGN_OR_RAISE(auto reader, + OpenReader(source_, *format_, options(), context()->pool)); +return IteratorFromReader(std::move(reader)); + } + + private: + std::shared_ptr format_; + FileSource source_; +}; + +Result CsvFileFormat::IsSupported(const FileSource& source) const { + RETURN_NOT_OK(source.Open().status()); Review comment: If source fails to open (for example if it points to a file which doesn't exist) that should raise an error rather than returning false, which is what this line detects. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about
[GitHub] [arrow] github-actions[bot] commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
github-actions[bot] commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-621937629 https://issues.apache.org/jira/browse/ARROW-8318 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove commented on pull request #7066: ARROW-8634: [Java] Add Java examples
andygrove commented on pull request #7066: URL: https://github.com/apache/arrow/pull/7066#issuecomment-621873527 Ah, I wish I had found this documentation before I started using Arrow Java! OK, I will just add links to the README instead then. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #7071: ARROW-7955: [Java] Support large buffer for file/stream IPC
liyafan82 commented on pull request #7071: URL: https://github.com/apache/arrow/pull/7071#issuecomment-621896058 Also add an integration test for VarCharVector, as it is possible that the size of the offset buffer be larger than Integer.MAX_VALUE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques opened a new pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
fsaintjacques opened a new pull request #7073: URL: https://github.com/apache/arrow/pull/7073 * Simplified FileSystemDataset to hold a FragmentVector. Each Fragment must be a FileFragment and is checked at `FileSystemDataset::Make`. Fragments are not required to use the same backing filesystem nor the same format. * Removed `FileSystemDataset::format` and `FileSystemDataset::partitions`. * Since FileInfo is not required by neither FileSystemDataset and FileSystemDatasetFactory, it is no possible to create a dataset without any IO involved. * Re-introduced the natural behavior of creating FileFragment with their full partition expressions instead of removing the ancestors common partitions. * Added `Expression::IsSatisfiableWith` method. * Added missing compression cmake options to archery. * Ensure FileSource holds a shared_ptr pointer. This is required to refactor FileSystemDataset to support Buffer FileSource and heterogeneous FileSystems. * Rename `type` to `id`, following other classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621958397 Revision: 0b532b06da2ffab64d34d4420790eee3e4f64ca3 Submitted crossbow builds: [ursa-labs/crossbow @ actions-185](https://github.com/ursa-labs/crossbow/branches/all?query=actions-185) |Task|Status| ||--| |test-debian-10-cpp|[![CircleCI](https://img.shields.io/circleci/build/github/ursa-labs/crossbow/actions-185-circle-test-debian-10-cpp.svg)](https://circleci.com/gh/ursa-labs/crossbow/tree/actions-185-circle-test-debian-10-cpp)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-621940060 > Fragments are not required to use the same backing filesystem nor the same format. Shouldn't we require that? That seems the goal of UnionDataset to combine datasets with different formats This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat
fsaintjacques commented on a change in pull request #7033: URL: https://github.com/apache/arrow/pull/7033#discussion_r417959216 ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,136 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/file_csv.h" + +#include +#include +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/csv/reader.h" +#include "arrow/dataset/dataset_internal.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/filter.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/result.h" +#include "arrow/type.h" +#include "arrow/util/iterator.h" + +namespace arrow { +namespace dataset { + +using internal::checked_cast; +using internal::checked_pointer_cast; + +static inline Result GetConvertOptions( +const CsvFileFormat& format, const std::shared_ptr& scan_options) { + auto options = csv::ConvertOptions::Defaults(); + if (scan_options != nullptr) { +// This is set to true to match behavior with other formats; a missing column +// will be materialized as null. +options.include_missing_columns = true; Review comment: I would expect the FilterAndProjectScanTask to fix this in a more efficient way. ## File path: cpp/src/arrow/dataset/file_csv.cc ## @@ -0,0 +1,136 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/dataset/file_csv.h" + +#include +#include +#include +#include + +#include "arrow/csv/options.h" +#include "arrow/csv/reader.h" +#include "arrow/dataset/dataset_internal.h" +#include "arrow/dataset/file_base.h" +#include "arrow/dataset/filter.h" +#include "arrow/dataset/type_fwd.h" +#include "arrow/dataset/visibility.h" +#include "arrow/result.h" +#include "arrow/type.h" +#include "arrow/util/iterator.h" + +namespace arrow { +namespace dataset { + +using internal::checked_cast; +using internal::checked_pointer_cast; + +static inline Result GetConvertOptions( +const CsvFileFormat& format, const std::shared_ptr& scan_options) { + auto options = csv::ConvertOptions::Defaults(); + if (scan_options != nullptr) { +// This is set to true to match behavior with other formats; a missing column +// will be materialized as null. +options.include_missing_columns = true; + +for (const auto& field : scan_options->schema()->fields()) { + options.column_types[field->name()] = field->type(); + options.include_columns.push_back(field->name()); +} + +// FIXME(bkietz) also acquire types of fields materialized but not projected. +for (auto&& name : FieldsInExpression(scan_options->filter)) { + ARROW_ASSIGN_OR_RAISE(auto match, + FieldRef(name).FindOneOrNone(*scan_options->schema())); + if (match.indices().empty()) { +options.include_columns.push_back(std::move(name)); + } +} + } + return options; +} + +static inline csv::ReadOptions GetReadOptions(const CsvFileFormat& format) { + auto options = csv::ReadOptions::Defaults(); + // Multithreaded conversion of individual files would lead to excessive thread + // contention when ScanTasks are also executed in multiple threads, so we disable it + // here. + options.use_threads = false; + return options; +} + +static inline Result> OpenReader( +const FileSource& source, const CsvFileFormat& format, +const std::shared_ptr& options = nullptr,
[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-621946996 @github-actions crossbow submit test-debian-10-cpp This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] sunchao commented on pull request #7076: ARROW-8659: [Rust] ListBuilder allocate with_capacity
sunchao commented on pull request #7076: URL: https://github.com/apache/arrow/pull/7076#issuecomment-622616144 Merged. Thanks @tustvold ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou edited a comment on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou edited a comment on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622586181 @kszucs I want to set `DOCKERHUB_USER` and `DOCKERHUB_TOKEN` in https://travis-ci.org/github/ursa-labs/crossbow and https://github.com/ursa-labs/crossbow. Which user should we use for this? It seems that we use them for https://hub.docker.com/u/asfjenkins on GitHub Actions in this repository. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou edited a comment on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou edited a comment on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622586181 @kszucs I want to set `DOCKERHUB_USER` and `DOCKERHUB_TOKEN` in https://travis-ci.org/github/ursa-labs/crossbow and https://github.com/ursa-labs/crossbow. Which user should we use for this? It seems that we use them for https://hub.docker.com/u/asfjenkins on GitHub Actions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kszucs commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622672970 @kou yes we use asf provided credentials on github actions to upload the images. We need a user with write access to that repository with a custom dockerhub token. Just granted you admin rights on ursa-labs/crossbow so you can set it yourself (or I can do it later if you want me to). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622731551 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622731373 Thanks! I set mine to https://travis-ci.org/github/ursa-labs/crossbow and https://github.com/ursa-labs/crossbow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] xhochy commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
xhochy commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622729700 Actually, I think the most practical approach would be to use the conda-forge recipes, patch them to use `vs2015` and upload them to a separate channel. This gives us the build machinery to not rebuild them in every CI run and also keeps us from maintaining a complete set of build scripts, we just need to copy them over from time to time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column
nealrichardson commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622495803 For what it's worth, R on Windows uses mingw, not msvc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
wesm commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622496448 Is this change necessary? I understand why we are using VS2017 in the conda package but why in the wheels? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7077: ARROW-8660: [C++][Gandiva] Reduce usage of Boost in Gandiva codebase
wesm commented on pull request #7077: URL: https://github.com/apache/arrow/pull/7077#issuecomment-622513296 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mayuropensource commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics
mayuropensource commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-622581788 thank you @wesm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format
wesm edited a comment on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622586463 Problems: * There aren't any unit tests in this patch so there is some work to do to get this merged * Code is duplicated from arrow/util/parsing.h I started working on this so I'll either post an update here or open a new PR. Since there's a good deal more code changes required it would be a good idea to have another round of code review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
github-actions[bot] commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622733242 Revision: c73b2b65c373892f95dab8d65cf6fae06a39fe68 Submitted crossbow builds: [ursa-labs/crossbow @ actions-198](https://github.com/ursa-labs/crossbow/branches/all?query=actions-198) |Task|Status| ||--| |centos-6-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-centos-6-amd64)| |centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-7-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-centos-7-amd64)| |centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-8-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-centos-8-amd64)| |debian-buster-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-debian-buster-amd64)| |debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |debian-stretch-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-debian-stretch-amd64)| |debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-bionic-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-ubuntu-bionic-amd64)| |ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-eoan-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-ubuntu-eoan-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-ubuntu-eoan-amd64)| |ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-focal-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-ubuntu-focal-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-ubuntu-focal-amd64)| |ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-xenial-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-198-github-ubuntu-xenial-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-198-github-ubuntu-xenial-amd64)| |ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-198-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] xhochy commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
xhochy commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622697125 We would need to switch away from using the compiled conda packages as the basis for the wheels. I'm not sure if there is a different source that could provide except Arrow building them on their own. With my Arrow hat on, this is really annoying as either build times will increase significantly or we would have to use make a pre-filled Windows Docker image that already includes all our static libraries. With my conda-forge hat on, I would actually love to see this happen as this would mean that we could get rid of the static libraries which we actually don't want to build and distribute. As a package manager who can explicitly deal with native dependencies, dynamic linkage makes the package maintenance a whole lot easier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622885428 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
github-actions[bot] commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622887235 Revision: c73b2b65c373892f95dab8d65cf6fae06a39fe68 Submitted crossbow builds: [ursa-labs/crossbow @ actions-199](https://github.com/ursa-labs/crossbow/branches/all?query=actions-199) |Task|Status| ||--| |centos-6-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-centos-6-amd64)| |centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-7-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-centos-7-amd64)| |centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-8-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-centos-8-amd64)| |debian-buster-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-debian-buster-amd64)| |debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |debian-stretch-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-debian-stretch-amd64)| |debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-bionic-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-ubuntu-bionic-amd64)| |ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-eoan-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-ubuntu-eoan-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-ubuntu-eoan-amd64)| |ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-focal-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-ubuntu-focal-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-ubuntu-focal-amd64)| |ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-xenial-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-199-github-ubuntu-xenial-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-199-github-ubuntu-xenial-amd64)| |ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-199-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nevi-me commented on pull request #7037: ARROW-6718: [Rust] Remove packed_simd
nevi-me commented on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-622889198 Hi @yordan-pavlov, we want to remove packed_simd due to the uncertainty with it being stabilised soon. We so far found that if we optimise some non-SIMD code, we don't lose a lot of performance relative to the explicit SIMD code. It'd be great if we could see where we could further improve the non-SIMD functions This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] xhochy commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
xhochy commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622494222 > @xhochy to run the wheels, or build them? To run. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tobim commented on pull request #7038: ARROW-8593: [C++][Parquet] Fix build with musl libc
tobim commented on pull request #7038: URL: https://github.com/apache/arrow/pull/7038#issuecomment-622535149 @fsaintjacques @emkornfield sorry for the long silence, I updated the commit as you suggested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pauldix commented on a change in pull request #7064: ARROW-6945: [Rust] WIP: Add initial skeleton for Rust integration tests
pauldix commented on a change in pull request #7064: URL: https://github.com/apache/arrow/pull/7064#discussion_r418732315 ## File path: rust/arrow/Cargo.toml ## @@ -50,6 +50,7 @@ chrono = "0.4" flatbuffers = "0.6" hex = "0.4" arrow-flight = { path = "../arrow-flight", optional = true } +clap = "2.33.0" Review comment: It's not clear to me how trybuild would be used in conjunction with the Archery suite, which just seems to want three CLI executables to pipe invoke. Maybe these would be best in a separate project on the same level as `arrow`, `arrow-flight` etd? `arrow-integration` perhaps? Then I'll just import from arrow and we keep the actual library dependencies clean. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format
wesm edited a comment on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622586463 Problems: * There aren't any unit tests in this patch so there is some work to do to get this merged * Code is duplicated from arrow/util/parsing.h This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
wesm edited a comment on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622496448 Is this change necessary? I understand why we are using VS2017 in the conda package but why in the wheels? I'm sort of -0.5 on this unless there is a concrete reason why we can't keep using VS2015 (because it may require users to install a different C++ redistributable than the VS2015 one that other packages require) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622586181 @kszucs I want to set `DOCKERHUB_USER` and `DOCKERHUB_TOKEN` in https://travis-ci.org/github/ursa-labs/crossbow . Which user should we use for this? It seems that we use them for https://hub.docker.com/u/asfjenkins on GitHub Actions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou opened a new pull request #7085: URL: https://github.com/apache/arrow/pull/7085 If we use QEMU on GitHub Actions, it takes 6h+. If we use ARM machine on Travis CI, it takes 30-40m. This change adds Docker image caching to apache/arrow-dev:${ARCH}-{DISTRIBUTION}-${CODE_NAME_OR_VERSION}-package-${PACKAGE_NAME}-llvm-${LLVM_VERSION}. It refers .env at the top directory. So we can override the "apache/arrow-dev" by the REPO environment variable for testing. This change disables Gandiva for Debian/Ubuntu ARM packages because apt.llvm.org doesn't provide packages for ARM. This change disables Gandiva for CentOS 8 ARM packages because we don't have enough disk space to build it. We may need to disable Flight later. This change adds ARM package builds to nightly jobs. If they are fragile, I'll remove them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622584942 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format
wesm commented on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622586463 There aren't any unit tests in this patch so there is some work to do to get this merged This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds
fsaintjacques commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622491245 @xhochy to run the wheels, or build them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group
wesm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-622529037 thanks @lidavidm! I'm confident we'll be able to devise some solutions to the resource allocation problem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org