[arrow] branch feature/format-string-view created (now 74756051c4)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch feature/format-string-view in repository https://gitbox.apache.org/repos/asf/arrow.git at 74756051c4 ARROW-16855: [C++] Adding Read Relation ToProto (#13401) No new revisions were added by this update.
[arrow] branch master updated: ARROW-17296: [Python] Update serialized metadata size in pyarrow.parquet.read_metadata doctest (#13790)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new ee874d67dd ARROW-17296: [Python] Update serialized metadata size in pyarrow.parquet.read_metadata doctest (#13790) ee874d67dd is described below commit ee874d67ddd417e5c33aff1979df782c4dfa1dfb Author: Wes McKinney AuthorDate: Wed Aug 3 15:11:52 2022 -0600 ARROW-17296: [Python] Update serialized metadata size in pyarrow.parquet.read_metadata doctest (#13790) This should remain correct until we hit major version 100 (or make changes that otherwise affect the metadata size) Lead-authored-by: Wes McKinney Co-authored-by: Wes McKinney Signed-off-by: Wes McKinney --- python/pyarrow/parquet/__init__.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/pyarrow/parquet/__init__.py b/python/pyarrow/parquet/__init__.py index 5feb922060..5f616bc209 100644 --- a/python/pyarrow/parquet/__init__.py +++ b/python/pyarrow/parquet/__init__.py @@ -3419,7 +3419,7 @@ def read_metadata(where, memory_map=False, decryption_properties=None): num_rows: 3 num_row_groups: 1 format_version: 2.6 - serialized_size: 561 + serialized_size: ... """ return ParquetFile(where, memory_map=memory_map, decryption_properties=decryption_properties).metadata
[arrow] branch master updated: ARROW-17213: [C++] Fix for valgrind issue in test-r-linux-valgrind crossbow build (#13715)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 49ae8fa953 ARROW-17213: [C++] Fix for valgrind issue in test-r-linux-valgrind crossbow build (#13715) 49ae8fa953 is described below commit 49ae8fa9536b117f26e83941619df3b0e1b9e18a Author: Wes McKinney AuthorDate: Tue Jul 26 20:12:41 2022 -0600 ARROW-17213: [C++] Fix for valgrind issue in test-r-linux-valgrind crossbow build (#13715) Authored-by: Wes McKinney Signed-off-by: Wes McKinney --- cpp/src/arrow/compute/kernels/scalar_compare.cc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/cpp/src/arrow/compute/kernels/scalar_compare.cc b/cpp/src/arrow/compute/kernels/scalar_compare.cc index f071986dd2..cfe1085531 100644 --- a/cpp/src/arrow/compute/kernels/scalar_compare.cc +++ b/cpp/src/arrow/compute/kernels/scalar_compare.cc @@ -271,8 +271,7 @@ struct CompareKernel { if (out_is_byte_aligned) { out_buffer = out_arr->buffers[1].data + out_arr->offset / 8; } else { - ARROW_ASSIGN_OR_RAISE(out_buffer_tmp, - ctx->Allocate(bit_util::BytesForBits(batch.length))); + ARROW_ASSIGN_OR_RAISE(out_buffer_tmp, ctx->AllocateBitmap(batch.length)); out_buffer = out_buffer_tmp->mutable_data(); } if (batch[0].is_array() && batch[1].is_array()) {
[arrow-datafusion-python] branch master updated: Add .asf.yaml
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-python.git The following commit(s) were added to refs/heads/master by this push: new 698fa72 Add .asf.yaml 698fa72 is described below commit 698fa727fab25e31f9f09780e5f4a79d8966c192 Author: Wes McKinney AuthorDate: Thu Jul 21 17:46:48 2022 -0500 Add .asf.yaml --- .asf.yaml | 31 +++ 1 file changed, 31 insertions(+) diff --git a/.asf.yaml b/.asf.yaml new file mode 100644 index 000..e59b243 --- /dev/null +++ b/.asf.yaml @@ -0,0 +1,31 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +notifications: + commits: commits@arrow.apache.org + issues: git...@arrow.apache.org + pullrequests: git...@arrow.apache.org + jira_options: link label worklog +github: + description: "Apache Arrow DataFusion Python Bindings" + homepage: https://arrow.apache.org/datafusion + enabled_merge_buttons: +squash: true +merge: false +rebase: false + features: +issues: true
[arrow] branch master updated: ARROW-17135: [C++] Reduce code size in compute/kernels/scalar_compare.cc (#13654)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 1214083f7e ARROW-17135: [C++] Reduce code size in compute/kernels/scalar_compare.cc (#13654) 1214083f7e is described below commit 1214083f7ece4e1797b7f3cdecfec1c2cfa8bf89 Author: Wes McKinney AuthorDate: Wed Jul 20 13:12:23 2022 -0700 ARROW-17135: [C++] Reduce code size in compute/kernels/scalar_compare.cc (#13654) This "leaner" implementation reduces the generated code size of this C++ file from 2307768 bytes to 1192608 bytes in gcc 10.3.0. The benchmarks are also faster (on my avx2 laptop): before ``` --- Benchmark Time CPU Iterations UserCounters... --- GreaterArrayArrayInt64/32768/1 32.1 us 32.1 us 21533 items_per_second=1020.16M/s null_percent=0.01 size=32.768k GreaterArrayArrayInt64/32768/100 32.1 us 32.1 us 21603 items_per_second=1019.27M/s null_percent=1 size=32.768k GreaterArrayArrayInt64/32768/1032.1 us 32.1 us 21479 items_per_second=1020.82M/s null_percent=10 size=32.768k GreaterArrayArrayInt64/32768/2 32.0 us 32.0 us 21468 items_per_second=1023.12M/s null_percent=50 size=32.768k GreaterArrayArrayInt64/32768/1 32.3 us 32.3 us 21720 items_per_second=1013.44M/s null_percent=100 size=32.768k GreaterArrayArrayInt64/32768/0 31.6 us 31.6 us 21828 items_per_second=1036.94M/s null_percent=0 size=32.768k GreaterArrayScalarInt64/32768/120.8 us 20.8 us 33461 items_per_second=1.57238G/s null_percent=0.01 size=32.768k GreaterArrayScalarInt64/32768/100 20.9 us 20.9 us 33625 items_per_second=1.56611G/s null_percent=1 size=32.768k GreaterArrayScalarInt64/32768/10 20.8 us 20.8 us 33553 items_per_second=1.57338G/s null_percent=10 size=32.768k GreaterArrayScalarInt64/32768/220.9 us 20.9 us 33348 items_per_second=1.5687G/s null_percent=50 size=32.768k GreaterArrayScalarInt64/32768/120.9 us 20.9 us 33419 items_per_second=1.56879G/s null_percent=100 size=32.768k GreaterArrayScalarInt64/32768/020.5 us 20.5 us 34116 items_per_second=1.59837G/s null_percent=0 size=32.768k ``` after ``` --- Benchmark Time CPU Iterations UserCounters... --- GreaterArrayArrayInt64/32768/1 18.1 us 18.1 us 38751 items_per_second=1.81199G/s null_percent=0.01 size=32.768k GreaterArrayArrayInt64/32768/100 17.5 us 17.5 us 39374 items_per_second=1.86821G/s null_percent=1 size=32.768k GreaterArrayArrayInt64/32768/1019.0 us 19.0 us 33941 items_per_second=1.72066G/s null_percent=10 size=32.768k GreaterArrayArrayInt64/32768/2 18.0 us 18.0 us 39589 items_per_second=1.81817G/s null_percent=50 size=32.768k GreaterArrayArrayInt64/32768/1 18.1 us 18.1 us 39061 items_per_second=1.80719G/s null_percent=100 size=32.768k GreaterArrayArrayInt64/32768/0 17.5 us 17.5 us 39813 items_per_second=1.87031G/s null_percent=0 size=32.768k GreaterArrayScalarInt64/32768/116.3 us 16.3 us 42281 items_per_second=2.01525G/s null_percent=0.01 size=32.768k GreaterArrayScalarInt64/32768/100 16.5 us 16.5 us 42266 items_per_second=1.98195G/s null_percent=1 size=32.768k GreaterArrayScalarInt64/32768/10 16.5 us 16.5 us 41872 items_per_second=1.98615G/s null_percent=10 size=32.768k GreaterArrayScalarInt64/32768/216.3 us 16.3 us 42130 items_per_second=2.00447G/s null_percent=50 size=32.768k GreaterArrayScalarInt64/32768/116.2 us 16.2 us 42391 items_per_second=2.02296G/s null_percent=100 size=32.768k GreaterArrayScalarInt64/32768/015.9 us 15.9 us 43498 items_per_second=2.0614G/s null_percent=0 size=32.768k ``` Authored-by: Wes McKinney Signed-off-by: Wes McKinney --- cpp/src/arrow/compute/kernels/codegen_internal.cc | 4
[arrow] branch master updated: ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator (#13630)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 4d931ff1c0 ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator (#13630) 4d931ff1c0 is described below commit 4d931ff1c0f5661a9b134c514555c8d769001759 Author: Wes McKinney AuthorDate: Tue Jul 19 16:26:46 2022 -0500 ARROW-16852: [C++] Migrate remaining kernels to use ExecSpan, remove ExecBatchIterator (#13630) This completes the porting to use ExecSpan everywhere. I also changed the ExecBatchIterator benchmarks to use ExecSpan to show the performance improvement in input splitting that we've talked about in the past: Splitting inputs into small ExecSpan: ``` Benchmark Time CPU Iterations UserCounters... BM_ExecSpanIterator/1024 205671 ns 205667 ns 3395 items_per_second=4.86223k/s BM_ExecSpanIterator/4096 54749 ns54750 ns13121 items_per_second=18.265k/s BM_ExecSpanIterator/16384 15979 ns15979 ns42628 items_per_second=62.5824k/s BM_ExecSpanIterator/65536 5597 ns 5597 ns 125099 items_per_second=178.668k/s ``` Splitting inputs into small ExecBatch: ``` - Benchmark Time CPU Iterations UserCounters... - BM_ExecBatchIterator/102417163432 ns 17163171 ns 41 items_per_second=58.2643/s BM_ExecBatchIterator/4096 4243467 ns 4243316 ns 163 items_per_second=235.665/s BM_ExecBatchIterator/163841093680 ns 1093638 ns 620 items_per_second=914.38/s BM_ExecBatchIterator/65536 272451 ns 272435 ns 2584 items_per_second=3.6706k/s ``` Because the input in this benchmark has 1M elements, this shows that splitting into 1024 chunks of size 1024 adds only 0.2ms of overhead with ExecSpanIterator versus 17.16ms of overhead with ExecBatchIterator (> 80x improvement). This won't by itself do much to impact performance in Acero but things for the community to explore in the future are the following (this work that I've been doing has been a precondition to consider this): * A leaner ExecuteScalarExpression implementation that reuses temporary allocations (ARROW-16758) * Parallel expression evaluation * Better defining morsel (~1M elements) versus task (~1K elements) granularity in execution * Work stealing so that we don't "hog" the thread pools, and we keep the work pinned to a particular CPU core if there are other things going on at the same time Authored-by: Wes McKinney Signed-off-by: Wes McKinney --- cpp/src/arrow/array/data.cc| 6 +- cpp/src/arrow/array/data.h | 15 ++- cpp/src/arrow/compute/exec.cc | 142 - cpp/src/arrow/compute/exec.h | 34 +++-- cpp/src/arrow/compute/exec/aggregate.cc| 31 +++-- cpp/src/arrow/compute/exec/aggregate_node.cc | 25 ++-- cpp/src/arrow/compute/exec_internal.h | 40 +- cpp/src/arrow/compute/exec_test.cc | 131 --- cpp/src/arrow/compute/function_benchmark.cc| 26 ++-- cpp/src/arrow/compute/function_test.cc | 8 +- cpp/src/arrow/compute/kernel.h | 49 +++ cpp/src/arrow/compute/kernels/aggregate_basic.cc | 60 - .../compute/kernels/aggregate_basic_internal.h | 37 +++--- cpp/src/arrow/compute/kernels/aggregate_internal.h | 12 +- cpp/src/arrow/compute/kernels/aggregate_mode.cc| 28 .../arrow/compute/kernels/aggregate_quantile.cc| 42 -- cpp/src/arrow/compute/kernels/aggregate_tdigest.cc | 10 +- cpp/src/arrow/compute/kernels/aggregate_var_std.cc | 36 +++--- cpp/src/arrow/compute/kernels/hash_aggregate.cc| 140 ++-- .../arrow/compute/kernels/hash_aggregate_test.cc | 31 +++-- .../arrow/compute/kernels/scalar_cast_numeric.cc | 8 +- cpp/src/arrow/compute/kernels/scalar_nested.cc | 10 +- cpp/src/arrow/compute/row/grouper.cc | 42 +++--- cpp/src/arrow/compute/row/grouper.h| 2 +- cpp/src/arrow/dataset/partition.cc | 6 +- 25 files changed, 312 insertions(+), 659 deletions(-) diff --git a/cpp/src/arrow/array/data
[arrow] branch master updated: ARROW-16807: [C++][R] count distinct incorrectly merges state (#13583)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new af4db7731b ARROW-16807: [C++][R] count distinct incorrectly merges state (#13583) af4db7731b is described below commit af4db7731b1f857e78221c53c2d8221849b1eeec Author: octalene AuthorDate: Sat Jul 16 14:45:27 2022 -0700 ARROW-16807: [C++][R] count distinct incorrectly merges state (#13583) This addresses a bug where the `count_distinct` function simply added counts when merging state. The correct logic would be to return the number of distinct elements after both states have been merged. State for count_distinct is backed by a MemoTable, which is then backed by a HashTable. To properly merge state, this PR adds 2 functions to each MemoTable: `MaybeInsert` and `MergeTable`. The MaybeInsert function handles simplified logic for inserting an element into the MemoTable. The MergeTable function handles iteration over elements in the MemoTable _to be merged_. This PR also adds an R test and a C++ test. The R test mirrors what was provided in ARROW-16807. The C++ test, `AllChunkedArrayTypesWithNulls`, mirrors another C++ test, `AllArrayTypesWithNulls`, but uses chunked arrays for test data. Lead-authored-by: Aldrin Montana Co-authored-by: Aldrin M Co-authored-by: Wes McKinney Signed-off-by: Wes McKinney --- cpp/src/arrow/compute/kernels/aggregate_basic.cc | 17 -- cpp/src/arrow/compute/kernels/aggregate_test.cc | 72 cpp/src/arrow/compute/kernels/codegen_internal.h | 2 +- cpp/src/arrow/util/hashing.h | 32 +++ r/tests/testthat/test-dplyr-summarize.R | 9 +++ 5 files changed, 126 insertions(+), 6 deletions(-) diff --git a/cpp/src/arrow/compute/kernels/aggregate_basic.cc b/cpp/src/arrow/compute/kernels/aggregate_basic.cc index 57cee87f00..fec483318e 100644 --- a/cpp/src/arrow/compute/kernels/aggregate_basic.cc +++ b/cpp/src/arrow/compute/kernels/aggregate_basic.cc @@ -136,27 +136,34 @@ struct CountDistinctImpl : public ScalarAggregator { Status Consume(KernelContext*, const ExecBatch& batch) override { if (batch[0].is_array()) { const ArrayData& arr = *batch[0].array(); + this->has_nulls = arr.GetNullCount() > 0; + auto visit_null = []() { return Status::OK(); }; auto visit_value = [&](VisitorArgType arg) { -int y; +int32_t y; return memo_table_->GetOrInsert(arg, ); }; RETURN_NOT_OK(VisitArraySpanInline(arr, visit_value, visit_null)); - this->non_nulls += memo_table_->size(); - this->has_nulls = arr.GetNullCount() > 0; + } else { const Scalar& input = *batch[0].scalar(); this->has_nulls = !input.is_valid; + if (input.is_valid) { -this->non_nulls += batch.length; +int32_t unused; + RETURN_NOT_OK(memo_table_->GetOrInsert(UnboxScalar::Unbox(input), )); } } + +this->non_nulls = memo_table_->size(); + return Status::OK(); } Status MergeFrom(KernelContext*, KernelState&& src) override { const auto& other_state = checked_cast(src); -this->non_nulls += other_state.non_nulls; +RETURN_NOT_OK(this->memo_table_->MergeTable(*(other_state.memo_table_))); +this->non_nulls = this->memo_table_->size(); this->has_nulls = this->has_nulls || other_state.has_nulls; return Status::OK(); } diff --git a/cpp/src/arrow/compute/kernels/aggregate_test.cc b/cpp/src/arrow/compute/kernels/aggregate_test.cc index aa54fe5f3e..abd5b5210a 100644 --- a/cpp/src/arrow/compute/kernels/aggregate_test.cc +++ b/cpp/src/arrow/compute/kernels/aggregate_test.cc @@ -962,11 +962,83 @@ class TestCountDistinctKernel : public ::testing::Test { EXPECT_THAT(CallFunction("count_distinct", {input}, ), one); } + void CheckChunkedArr(const std::shared_ptr& type, + const std::vector& json, int64_t expected_all, + bool has_nulls = true) { +Check(ChunkedArrayFromJSON(type, json), expected_all, has_nulls); + } + CountOptions only_valid{CountOptions::ONLY_VALID}; CountOptions only_null{CountOptions::ONLY_NULL}; CountOptions all{CountOptions::ALL}; }; +TEST_F(TestCountDistinctKernel, AllChunkedArrayTypesWithNulls) { + // Boolean + CheckChunkedArr(boolean(), {"[]", "[]"}, 0, /*has_nulls=*/false); + CheckChunkedArr(boolean(), {"[true, null]", "[false, null, false]", "[true]"}, 3); + + // Number + for (auto ty : NumericTypes()) { +CheckChunkedArr(ty, {"[1, 1, null, 2]", "[5, 8, 9, 9, null, 10]", "[6, 6, 8, 9, 10]"}, +
[arrow] branch master updated: ARROW-16757: [C++][FOLLOWUP] Fix mingw32 RTools 4.0 build by removing usage of alignas (#13557)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 88b42ef66f ARROW-16757: [C++][FOLLOWUP] Fix mingw32 RTools 4.0 build by removing usage of alignas (#13557) 88b42ef66f is described below commit 88b42ef66fe664043c5ee5274b2982a3858b414e Author: Wes McKinney AuthorDate: Sun Jul 10 09:20:18 2022 -0500 ARROW-16757: [C++][FOLLOWUP] Fix mingw32 RTools 4.0 build by removing usage of alignas (#13557) Using `alignas(64)` (instead of `alignas(8)`) seemed to break this build. Authored-by: Wes McKinney Signed-off-by: Wes McKinney --- cpp/src/arrow/array/data.cc | 6 +++--- cpp/src/arrow/array/data.h| 2 +- cpp/src/arrow/compute/exec.cc | 4 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/cpp/src/arrow/array/data.cc b/cpp/src/arrow/array/data.cc index c1a597fea6..d3f28758d9 100644 --- a/cpp/src/arrow/array/data.cc +++ b/cpp/src/arrow/array/data.cc @@ -219,7 +219,7 @@ void FillZeroLengthArray(const DataType* type, ArraySpan* span) { span->length = 0; int num_buffers = GetNumBuffers(*type); for (int i = 0; i < num_buffers; ++i) { -span->buffers[i].data = span->scratch_space; +span->buffers[i].data = reinterpret_cast(span->scratch_space); span->buffers[i].size = 0; } @@ -270,7 +270,7 @@ void ArraySpan::FillFromScalar(const Scalar& value) { } } else if (is_base_binary_like(type_id)) { const auto& scalar = checked_cast(value); -this->buffers[1].data = this->scratch_space; +this->buffers[1].data = reinterpret_cast(this->scratch_space); const uint8_t* data_buffer = nullptr; int64_t data_size = 0; if (scalar.is_valid) { @@ -328,7 +328,7 @@ void ArraySpan::FillFromScalar(const Scalar& value) { // First buffer is kept null since unions have no validity vector this->buffers[0] = {}; -this->buffers[1].data = this->scratch_space; +this->buffers[1].data = reinterpret_cast(this->scratch_space); this->buffers[1].size = 1; int8_t* type_codes = reinterpret_cast(this->scratch_space); type_codes[0] = checked_cast(value).type_code; diff --git a/cpp/src/arrow/array/data.h b/cpp/src/arrow/array/data.h index fddc60293d..78643ae14a 100644 --- a/cpp/src/arrow/array/data.h +++ b/cpp/src/arrow/array/data.h @@ -269,7 +269,7 @@ struct ARROW_EXPORT ArraySpan { // 16 bytes of scratch space to enable this ArraySpan to be a view onto // scalar values including binary scalars (where we need to create a buffer // that looks like two 32-bit or 64-bit offsets) - alignas(64) uint8_t scratch_space[16]; + uint64_t scratch_space[2]; ArraySpan() = default; diff --git a/cpp/src/arrow/compute/exec.cc b/cpp/src/arrow/compute/exec.cc index e5e256ea6d..4dc5cdc542 100644 --- a/cpp/src/arrow/compute/exec.cc +++ b/cpp/src/arrow/compute/exec.cc @@ -383,6 +383,10 @@ int64_t ExecSpanIterator::GetNextChunkSpan(int64_t iteration_size, ExecSpan* spa continue; } const ChunkedArray* arg = args_->at(i).chunked_array().get(); +if (arg->num_chunks() == 0) { + iteration_size = 0; + continue; +} const Array* current_chunk; while (true) { current_chunk = arg->chunk(chunk_indexes_[i]).get();
[arrow-site] branch master updated (4066731 -> e599783)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git. from 4066731 ARROW-14626: [Website] Update versions tested on add e599783 [Website] Update Rust release details info in release blog post template (#136) No new revisions were added by this update. Summary of changes: release-announcement-template.md | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-)
[arrow-site] branch master updated: Add jiayuliu as committer (#152)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new 158713c Add jiayuliu as committer (#152) 158713c is described below commit 158713cca5dbd08c724eba0b6641f65949100ded Author: Jiayu Liu AuthorDate: Mon Oct 11 23:30:24 2021 +0800 Add jiayuliu as committer (#152) --- _data/committers.yml | 4 1 file changed, 4 insertions(+) diff --git a/_data/committers.yml b/_data/committers.yml index 40e7ea4..33daca4 100644 --- a/_data/committers.yml +++ b/_data/committers.yml @@ -263,3 +263,7 @@ role: Committer alias: houqp affiliation: Scribd, Inc. +- name: Jiayu Liu + role: Committer + alias: jiayuliu + affiliation: Airbnb Inc.
[arrow-site] branch master updated: Add `graphique` to 'powered by' page. (#143)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new 50b9c81 Add `graphique` to 'powered by' page. (#143) 50b9c81 is described below commit 50b9c815b02575a9c46e1bb520d4507fb2596996 Author: A. Coady AuthorDate: Tue Aug 24 17:26:48 2021 -0700 Add `graphique` to 'powered by' page. (#143) --- powered_by.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/powered_by.md b/powered_by.md index d09c8e6..1486179 100644 --- a/powered_by.md +++ b/powered_by.md @@ -104,6 +104,7 @@ short description of your use case. visualizations and/or further analytics. * **[GOAI][19]:** Open GPU-Accelerated Analytics Initiative for Arrow-powered analytics across GPU tools and vendors +* **[graphique][41]** GraphQL service for arrow tables and parquet data sets. The schema for a query API is derived automatically. * **[Graphistry][18]:** Supercharged Visual Investigation Platform used by teams for security, anti-fraud, and related investigations. The Graphistry team uses Arrow in its NodeJS GPU backend and client libraries, and is an @@ -219,3 +220,4 @@ short description of your use case. [38]: https://github.com/vaexio/vaex [39]: https://hash.ai [40]: https://github.com/pola-rs/polars +[41]: https://github.com/coady/graphique
[arrow-cookbook] branch main updated: Initial content for Arrow Cookbook for Python and R (#1)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git The following commit(s) were added to refs/heads/main by this push: new d93c637 Initial content for Arrow Cookbook for Python and R (#1) d93c637 is described below commit d93c637895ca40d6ec5371c6399757dac7a6f6ea Author: Alessandro Molina AuthorDate: Wed Jul 28 16:38:20 2021 +0200 Initial content for Arrow Cookbook for Python and R (#1) * Initial Import * R cookbook initial commit (#1) * R Cookbook skeleton and initial chapter * Move r test script to a separate directory * Add Apache 2 license * Add parquet section * Delete files used to demonstrate failing tests in CI * Licensing * Add content for different formats and rearrange headings * Small change to make the tests run on macOS * Completed the IO section and added intersphinx with PyArrow * Add workflow to deploy to GH pages * Update path * Rename chapters and fill in section titles * Commit whitespace to trigger build * Update bookdown job * try new job config * Install nightly Arrow * Evaluate all relevant bits! * Deploy to r dir * Try new workflow * update build path * Add email and update paths * Update job to build all cookbooks * Delete whitespace to trigger build * Swap order to see if this fixes build * Install system dependencies * Put it back on Mac so it's faster * Separate steps to diagnose issue * Brew not sudo * Switching to ubuntu as I don't understand why python 2 * Don't put results in r directory * Capitalise 'C' * Update bookdown link so can click to fork/edit * Add CI stage that runs tests * Add examples of manually creating Arrow objects and writing to various formats * Add S3 parquet * Partitioned data * Partitioned Data from S3 * Rename record_batch_create chunk * CSV recipe requires pandas * Filter parquet data on read * Reading/Writing feather files * remove duplicated chunk name * tweak create * Categorical data * Speed up compiling * Fix tests * tests pass * Data manipulation functions * Link to compute functions * Tweak naming * Add contribution file * landing page style tweak * Improve contribution documentation * Explicitly reference the contribution docs * ignore build directory * Change branch name * Update contents * Update CONTRIBUTING.md * Suggestions from Grammarly * Rename initial chapter * Update Makefile to allow Arrow version to be specified * Truncate license file to relevant part * typo * Apply suggestions from code review Co-authored-by: Weston Pace * Add link to code of conduct Co-authored-by: Ian Cook * Capitalise "Array" * Update r/CONTRIBUTING.md Co-authored-by: Ian Cook * Update r/content/manipulating_data.Rmd Co-authored-by: Weston Pace * Update r/content/manipulating_data.Rmd Co-authored-by: Weston Pace * Update r/content/manipulating_data.Rmd Co-authored-by: Weston Pace * Update r/content/reading_and_writing_data.Rmd Co-authored-by: Weston Pace * Update r/content/creating_arrow_objects.Rmd Co-authored-by: Ian Cook * Update r/content/manipulating_data.Rmd Co-authored-by: Ian Cook * Update r/content/manipulating_data.Rmd Co-authored-by: Ian Cook * Apply suggestions from code review Co-authored-by: Weston Pace Co-authored-by: Ian Cook * Mention dependencies * Mention that this is not the documentation * rewording * Add -jauto by default and indent a print * The Apache Software Foundation * reword * Correct ambiguous and incorrect phrasing * Update r/content/reading_and_writing_data.Rmd Co-authored-by: Weston Pace * Update r/content/reading_and_writing_data.Rmd Co-authored-by: Weston Pace * Reorder sections * Update r/content/manipulating_data.Rmd Co-authored-by: Ian Cook * Remove redundant code snippet * Update reading CSVs * Add in section on converting from/to Arrow Tables and tibbles * rephrase list of numbers * rephrase list of numbers * Add missing bracket * Rephrase about parquet containing multiple cols *
[arrow-cookbook] 01/01: Initial commit
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git commit a9352414df66e5387f478bee92d3de430d59cd47 Author: Wes McKinney AuthorDate: Wed Jul 14 16:42:28 2021 -0500 Initial commit --- .gitignore | 0 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/.gitignore b/.gitignore new file mode 100644 index 000..e69de29
[arrow-cookbook] branch main created (now a935241)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git. at a935241 Initial commit This branch includes the following new commits: new a935241 Initial commit The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-site] branch master updated: Removing extra "}}" from the Feather Python link. (#126)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new 141667f Removing extra "}}" from the Feather Python link. (#126) 141667f is described below commit 141667f0f163711d0a4ceb2c8b7ceda15bdf2e7c Author: Raul Ascencio AuthorDate: Wed Jul 14 15:10:21 2021 -0600 Removing extra "}}" from the Feather Python link. (#126) Currently, the page: https://arrow.apache.org/use_cases/ contains a python link for "Feather" with python using "https://arrow.apache.org/docs/python/feather.html%20%7D%7D; which redirects to a 404. Instead, it seems that we should be using the following: "https://arrow.apache.org/docs/python/feather.html;. --- use_cases.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/use_cases.md b/use_cases.md index 5dffaf8..f15e55c 100644 --- a/use_cases.md +++ b/use_cases.md @@ -36,7 +36,7 @@ and the [Apache Parquet](https://parquet.apache.org/) format. -* Feather: C++, [Python]({{ site.baseurl }}/docs/python/feather.html }}), +* Feather: C++, [Python]({{ site.baseurl }}/docs/python/feather.html), [R]({{ site.baseurl }}/docs/r/reference/read_feather.html) * Parquet: [C++]({{ site.baseurl }}/docs/cpp/parquet.html), [Python]({{ site.baseurl }}/docs/python/parquet.html),
[arrow-site] branch master updated: Add polars project to Powered By (#123)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new 66074d2 Add polars project to Powered By (#123) 66074d2 is described below commit 66074d254f96a8d7ba23d9142ad310e7d23de1a2 Author: Ritchie Vink AuthorDate: Mon Jul 5 18:43:55 2021 +0200 Add polars project to Powered By (#123) This PR proposes adding Polars to the list of projects that use Apache Arrow. --- powered_by.md | 6 ++ 1 file changed, 6 insertions(+) diff --git a/powered_by.md b/powered_by.md index 9fd3791..d09c8e6 100644 --- a/powered_by.md +++ b/powered_by.md @@ -137,6 +137,11 @@ short description of your use case. Parquet format. Petastorm supports popular Python-based machine learning (ML) frameworks such as Tensorflow, Pytorch, and PySpark. It can also be used from pure Python code. +* **[Polars][40]:** Polars is a blazingly fast DataFrame library and query engine + that aims to utilize modern hardware efficiently. + (e.g. multi-threading, SIMD vectorization, hiding memory latencies). + Polars is built upon Apache Arrow and uses its columnar memory, compute kernels, + and several IO utilities. Polars is written in Rust and available in Rust and Python. * **[Quilt Data][13]:** Quilt is a data package manager, designed to make managing data as easy as managing code. It supports Parquet format via pyarrow for data access. @@ -213,3 +218,4 @@ short description of your use case. [37]: https://github.com/tenzir/vast [38]: https://github.com/vaexio/vaex [39]: https://hash.ai +[40]: https://github.com/pola-rs/polars
[arrow] branch master updated (9162954 -> 7339bd5)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 9162954 ARROW-13065: [Packaging][RPM] Add missing required LZ4 version information add 7339bd5 [GitHub] Add shorter GitHub repository description to .asf.yaml No new revisions were added by this update. Summary of changes: .asf.yaml | 4 1 file changed, 4 insertions(+)
[arrow-site] branch master updated (2d7b592 -> abc9bb2)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git. from 2d7b592 ARROW-12192: [Website] Use downloadable URL for archive download add abc9bb2 ARROW-11911: [Website] Add protobuf vs arrow to FAQ (#97) No new revisions were added by this update. Summary of changes: faq.md | 25 + 1 file changed, 25 insertions(+)
[arrow-site] branch master updated: Adding Vaex to powered by (#98)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new a2f6faf Adding Vaex to powered by (#98) a2f6faf is described below commit a2f6faf0840c9ee42b8bead27652257fe687bfeb Author: Maarten Breddels AuthorDate: Tue Mar 9 17:34:31 2021 +0100 Adding Vaex to powered by (#98) --- powered_by.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/powered_by.md b/powered_by.md index 9a041bc..01dd9c5 100644 --- a/powered_by.md +++ b/powered_by.md @@ -163,6 +163,8 @@ short description of your use case. Database Connectivity (ODBC) interface. It provides the ability to return Arrow Tables and RecordBatches in addition to the Python Database API Specification 2.0. +* **[Vaex][38]:** Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, + ML, visualize and explore big tabular data at a billion rows per second. * **[VAST][37]:** A network telemetry engine for data-driven security investigations. VAST uses Arrow as standardized data plane to provide a high-bandwidth output path for downstream analytics. This makes it easy and @@ -205,3 +207,4 @@ short description of your use case. [35]: https://cylondata.org/ [36]: https://bodo.ai [37]: https://github.com/tenzir/vast +[38]: https://github.com/vaexio/vaex
[arrow] branch master updated (8df91c9 -> 8d76312)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 8df91c9 ARROW-10908: [Rust][DataFusion] Update relevant tpch-queries with BETWEEN add 8d76312 ARROW-6883: [C++][Python] Allow writing dictionary deltas No new revisions were added by this update. Summary of changes: cpp/src/arrow/array/array_base.cc| 18 ++-- cpp/src/arrow/array/array_base.h | 12 ++- cpp/src/arrow/flight/client.cc | 8 ++ cpp/src/arrow/flight/server.cc | 9 +- cpp/src/arrow/ipc/options.h | 14 +++ cpp/src/arrow/ipc/read_write_test.cc | 164 +-- cpp/src/arrow/ipc/reader.cc | 44 ++ cpp/src/arrow/ipc/reader.h | 5 +- cpp/src/arrow/ipc/writer.cc | 76 +--- cpp/src/arrow/ipc/writer.h | 20 + docs/source/status.rst | 25 +- python/pyarrow/includes/libarrow.pxd | 35 ++-- python/pyarrow/ipc.pxi | 82 +- python/pyarrow/ipc.py| 3 +- python/pyarrow/tests/test_ipc.py | 56 15 files changed, 509 insertions(+), 62 deletions(-)
[arrow] branch master updated (b8e021c -> 8b9f6b9)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from b8e021c ARROW-10634: [C#][CI] Change the build version from 2.2 to 3.1 in CI add 8b9f6b9 ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance No new revisions were added by this update. Summary of changes: cpp/src/arrow/util/bitmap_generate.h | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-)
[arrow] branch master updated (4d2cf9f -> 9e587be)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 4d2cf9f ARROW-10175: [CI] Fix nightly HDFS integration tests (ensure to use legacy dataset) add 9e587be ARROW-10206: [C++][Python][FlightRPC] Allow disabling server validation No new revisions were added by this update. Summary of changes: ci/conda_env_cpp.yml | 2 +- cpp/cmake_modules/Findzstd.cmake | 20 +++-- cpp/src/arrow/flight/CMakeLists.txt| 42 ++ cpp/src/arrow/flight/client.cc | 95 +++--- cpp/src/arrow/flight/client.h | 6 ++ cpp/src/arrow/flight/flight_test.cc| 26 ++ .../check_tls_opts_127.cc} | 44 -- .../check_tls_opts_132.cc} | 44 -- python/pyarrow/_flight.pyx | 31 +-- python/pyarrow/includes/libarrow_flight.pxd| 1 + python/pyarrow/tests/test_flight.py| 13 +++ 11 files changed, 244 insertions(+), 80 deletions(-) copy cpp/src/arrow/flight/{middleware_internal.h => try_compile/check_tls_opts_127.cc} (55%) copy cpp/src/arrow/flight/{middleware_internal.h => try_compile/check_tls_opts_132.cc} (56%)
[arrow] branch master updated (105873e -> b2842ab)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 105873e ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp add b2842ab ARROW-10147: [Python] Pandas metadata fails if index name not JSON-serializable No new revisions were added by this update. Summary of changes: python/pyarrow/pandas_compat.py | 11 ++- python/pyarrow/tests/test_pandas.py | 14 ++ 2 files changed, 24 insertions(+), 1 deletion(-)
[arrow] branch master updated (ecc3ed8 -> 72a0e96)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from ecc3ed8 ARROW-10008: [C++][Dataset] Fix filtering/row group statistics of dict columns add 72a0e96 ARROW-10121: [C++] Fix emission of new dictionaries in IPC writer No new revisions were added by this update. Summary of changes: cpp/src/arrow/ipc/CMakeLists.txt | 3 +- cpp/src/arrow/ipc/dictionary.cc | 15 +- cpp/src/arrow/ipc/dictionary.h | 5 +- cpp/src/arrow/ipc/read_write_test.cc | 652 --- cpp/src/arrow/ipc/reader.cc | 141 ++-- cpp/src/arrow/ipc/reader.h | 29 +- cpp/src/arrow/ipc/tensor_test.cc | 506 +++ cpp/src/arrow/ipc/writer.cc | 86 +++-- cpp/src/arrow/ipc/writer.h | 3 + cpp/src/arrow/pretty_print.cc| 2 +- 10 files changed, 943 insertions(+), 499 deletions(-) create mode 100644 cpp/src/arrow/ipc/tensor_test.cc
[arrow] branch master updated (a1157b7 -> 9bff7c4)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a1157b7 ARROW-10136: [Rust]: Fix null handling in StringArray and BinaryArray filtering, add BinaryArray::from_opt_vec add 9bff7c4 ARROW-10054: [Python] don't crash when slice offset > length No new revisions were added by this update. Summary of changes: python/pyarrow/array.pxi | 1 + python/pyarrow/table.pxi | 3 +++ python/pyarrow/tests/test_array.py | 2 ++ python/pyarrow/tests/test_table.py | 24 4 files changed, 30 insertions(+)
[arrow] branch master updated (571d48e -> 4b0448b)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 571d48e ARROW-10119: [C++] Fix Parquet crashes on invalid input add 4b0448b ARROW-10124: [C++] Don't restrict permissions when creating files No new revisions were added by this update. Summary of changes: cpp/src/arrow/util/io_util.cc | 11 +-- python/pyarrow/tests/test_io.py | 16 2 files changed, 17 insertions(+), 10 deletions(-)
[arrow] branch master updated (515daab -> 477c102)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 515daab ARROW-8618: [C++] Clean up some redundant std::move()s add 477c102 ARROW-9924: [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans No new revisions were added by this update. Summary of changes: c_glib/test/dataset/test-scan-options.rb | 2 +- cpp/src/arrow/dataset/file_parquet.cc| 4 ++ cpp/src/arrow/dataset/file_parquet.h | 6 +++ cpp/src/arrow/dataset/scanner.h | 4 +- cpp/src/parquet/arrow/reader.cc | 49 +-- python/pyarrow/_dataset.pyx | 37 ++ python/pyarrow/dataset.py| 2 +- python/pyarrow/includes/libarrow_dataset.pxd | 1 + python/pyarrow/parquet.py| 73 9 files changed, 119 insertions(+), 59 deletions(-)
[arrow] branch master updated (515daab -> 477c102)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 515daab ARROW-8618: [C++] Clean up some redundant std::move()s add 477c102 ARROW-9924: [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans No new revisions were added by this update. Summary of changes: c_glib/test/dataset/test-scan-options.rb | 2 +- cpp/src/arrow/dataset/file_parquet.cc| 4 ++ cpp/src/arrow/dataset/file_parquet.h | 6 +++ cpp/src/arrow/dataset/scanner.h | 4 +- cpp/src/parquet/arrow/reader.cc | 49 +-- python/pyarrow/_dataset.pyx | 37 ++ python/pyarrow/dataset.py| 2 +- python/pyarrow/includes/libarrow_dataset.pxd | 1 + python/pyarrow/parquet.py| 73 9 files changed, 119 insertions(+), 59 deletions(-)
[arrow-site] branch master updated: ARROW-7384: Add an allow-all robots.txt (#76)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new ae5fbf9 ARROW-7384: Add an allow-all robots.txt (#76) ae5fbf9 is described below commit ae5fbf9ffec88dddc56c36d749849e8f164efc89 Author: Uwe L. Korn AuthorDate: Sun Sep 27 22:15:59 2020 +0200 ARROW-7384: Add an allow-all robots.txt (#76) --- robots.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt new file mode 100644 index 000..f6e6d1d --- /dev/null +++ b/robots.txt @@ -0,0 +1,2 @@ +User-Agent: * +Allow: /
[arrow] branch master updated (fe862a4 -> 97ade81)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from fe862a4 ARROW-9981: [Rust] [Flight] Expose IpcWriteOptions on utils add 97ade81 ARROW-8601: [Go][FOLLOWUP] Fix RAT violations related to Flight in Go No new revisions were added by this update. Summary of changes: dev/release/rat_exclude_files.txt | 2 ++ 1 file changed, 2 insertions(+)
[arrow] branch master updated: ARROW-8601: [Go][Flight] Implementations Flight RPC server and client
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new c0dd2e2 ARROW-8601: [Go][Flight] Implementations Flight RPC server and client c0dd2e2 is described below commit c0dd2e2166f5f3a9c6b6a03c6983bd886de16c65 Author: Matthew Topol AuthorDate: Thu Sep 24 20:33:00 2020 -0500 ARROW-8601: [Go][Flight] Implementations Flight RPC server and client Built out from https://github.com/apache/arrow/pull/6731 with some inspiration from the existing Reader/Writer and the C++ Flight implementation. Still need to build out the tests some more, but would like to get opinions and thoughts on what I've got so far as I continue to build it out. Closes #8175 from zeroshade/zeroshade/go/flight Authored-by: Matthew Topol Signed-off-by: Wes McKinney --- format/Flight.proto |2 + go/arrow/flight/Flight.pb.go | 1473 + go/arrow/flight/Flight_grpc.pb.go | 877 +++ go/arrow/flight/client.go | 89 ++ go/arrow/flight/client_auth.go| 91 ++ go/arrow/flight/example_flight_server_test.go | 70 ++ go/arrow/flight/flight_test.go| 305 + go/arrow/{go.mod => flight/gen.go}| 12 +- go/arrow/flight/server.go | 118 ++ go/arrow/flight/server_auth.go| 145 +++ go/arrow/go.mod |8 + go/arrow/go.sum | 94 ++ go/arrow/ipc/flight_data_reader.go| 210 go/arrow/ipc/flight_data_writer.go| 150 +++ 14 files changed, 3634 insertions(+), 10 deletions(-) diff --git a/format/Flight.proto b/format/Flight.proto index 71ae7ca..7b0f591 100644 --- a/format/Flight.proto +++ b/format/Flight.proto @@ -19,6 +19,8 @@ syntax = "proto3"; option java_package = "org.apache.arrow.flight.impl"; +option go_package = "github.com/apache/arrow/go/flight;flight"; + package arrow.flight.protocol; /* diff --git a/go/arrow/flight/Flight.pb.go b/go/arrow/flight/Flight.pb.go new file mode 100644 index 000..75c6c2c --- /dev/null +++ b/go/arrow/flight/Flight.pb.go @@ -0,0 +1,1473 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go. DO NOT EDIT. +// versions: +// protoc-gen-go v1.25.0 +// protocv3.9.1 +// source: Flight.proto + +package flight + +import ( + proto "github.com/golang/protobuf/proto" + protoreflect "google.golang.org/protobuf/reflect/protoreflect" + protoimpl "google.golang.org/protobuf/runtime/protoimpl" + reflect "reflect" + sync "sync" +) + +const ( + // Verify that this generated code is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(20 - protoimpl.MinVersion) + // Verify that runtime/protoimpl is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(protoimpl.MaxVersion - 20) +) + +// This is a compile-time assertion that a sufficiently up-to-date version +// of the legacy proto package is being used. +const _ = proto.ProtoPackageIsVersion4 + +// +// Describes what type of descriptor is defined. +type FlightDescriptor_DescriptorType int32 + +const ( + // Protobuf pattern, not used. + FlightDescriptor_UNKNOWN FlightDescriptor_DescriptorType = 0 + // + // A named path that identifies a dataset. A path is composed of a string + // or list of strings describing a particular dataset. This is conceptually + // similar to a path inside a filesystem. + FlightDescriptor_PATH FlightDescriptor_DescriptorType = 1 + // + // An opaque command to generate a dataset. + FlightDescriptor_CMD FlightDescriptor_DescriptorType = 2 +) + +// Enum value maps for FlightDescriptor_DescriptorType. +var ( + FlightDescriptor_DescriptorType_name = map[int32]string{ + 0: "UNKNOWN", +
[arrow] branch master updated: ARROW-8601: [Go][Flight] Implementations Flight RPC server and client
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new c0dd2e2 ARROW-8601: [Go][Flight] Implementations Flight RPC server and client c0dd2e2 is described below commit c0dd2e2166f5f3a9c6b6a03c6983bd886de16c65 Author: Matthew Topol AuthorDate: Thu Sep 24 20:33:00 2020 -0500 ARROW-8601: [Go][Flight] Implementations Flight RPC server and client Built out from https://github.com/apache/arrow/pull/6731 with some inspiration from the existing Reader/Writer and the C++ Flight implementation. Still need to build out the tests some more, but would like to get opinions and thoughts on what I've got so far as I continue to build it out. Closes #8175 from zeroshade/zeroshade/go/flight Authored-by: Matthew Topol Signed-off-by: Wes McKinney --- format/Flight.proto |2 + go/arrow/flight/Flight.pb.go | 1473 + go/arrow/flight/Flight_grpc.pb.go | 877 +++ go/arrow/flight/client.go | 89 ++ go/arrow/flight/client_auth.go| 91 ++ go/arrow/flight/example_flight_server_test.go | 70 ++ go/arrow/flight/flight_test.go| 305 + go/arrow/{go.mod => flight/gen.go}| 12 +- go/arrow/flight/server.go | 118 ++ go/arrow/flight/server_auth.go| 145 +++ go/arrow/go.mod |8 + go/arrow/go.sum | 94 ++ go/arrow/ipc/flight_data_reader.go| 210 go/arrow/ipc/flight_data_writer.go| 150 +++ 14 files changed, 3634 insertions(+), 10 deletions(-) diff --git a/format/Flight.proto b/format/Flight.proto index 71ae7ca..7b0f591 100644 --- a/format/Flight.proto +++ b/format/Flight.proto @@ -19,6 +19,8 @@ syntax = "proto3"; option java_package = "org.apache.arrow.flight.impl"; +option go_package = "github.com/apache/arrow/go/flight;flight"; + package arrow.flight.protocol; /* diff --git a/go/arrow/flight/Flight.pb.go b/go/arrow/flight/Flight.pb.go new file mode 100644 index 000..75c6c2c --- /dev/null +++ b/go/arrow/flight/Flight.pb.go @@ -0,0 +1,1473 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Code generated by protoc-gen-go. DO NOT EDIT. +// versions: +// protoc-gen-go v1.25.0 +// protocv3.9.1 +// source: Flight.proto + +package flight + +import ( + proto "github.com/golang/protobuf/proto" + protoreflect "google.golang.org/protobuf/reflect/protoreflect" + protoimpl "google.golang.org/protobuf/runtime/protoimpl" + reflect "reflect" + sync "sync" +) + +const ( + // Verify that this generated code is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(20 - protoimpl.MinVersion) + // Verify that runtime/protoimpl is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(protoimpl.MaxVersion - 20) +) + +// This is a compile-time assertion that a sufficiently up-to-date version +// of the legacy proto package is being used. +const _ = proto.ProtoPackageIsVersion4 + +// +// Describes what type of descriptor is defined. +type FlightDescriptor_DescriptorType int32 + +const ( + // Protobuf pattern, not used. + FlightDescriptor_UNKNOWN FlightDescriptor_DescriptorType = 0 + // + // A named path that identifies a dataset. A path is composed of a string + // or list of strings describing a particular dataset. This is conceptually + // similar to a path inside a filesystem. + FlightDescriptor_PATH FlightDescriptor_DescriptorType = 1 + // + // An opaque command to generate a dataset. + FlightDescriptor_CMD FlightDescriptor_DescriptorType = 2 +) + +// Enum value maps for FlightDescriptor_DescriptorType. +var ( + FlightDescriptor_DescriptorType_name = map[int32]string{ + 0: "UNKNOWN", +
[arrow] branch master updated (152f8b0 -> ac86123)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 152f8b0 ARROW-10066: [C++] Make sure default AWS region selection algorithm is used add ac86123 ARROW-9970: [Go] fix checkptr failure in sum methods No new revisions were added by this update. Summary of changes: go/arrow/math/float64_avx2_amd64.go | 4 ++-- go/arrow/math/float64_sse4_amd64.go | 4 ++-- go/arrow/math/int64_avx2_amd64.go | 4 ++-- go/arrow/math/int64_sse4_amd64.go | 4 ++-- go/arrow/math/type_simd_amd64.go.tmpl | 4 ++-- go/arrow/math/uint64_avx2_amd64.go| 4 ++-- go/arrow/math/uint64_sse4_amd64.go| 4 ++-- 7 files changed, 14 insertions(+), 14 deletions(-)
[arrow] branch master updated (152f8b0 -> ac86123)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 152f8b0 ARROW-10066: [C++] Make sure default AWS region selection algorithm is used add ac86123 ARROW-9970: [Go] fix checkptr failure in sum methods No new revisions were added by this update. Summary of changes: go/arrow/math/float64_avx2_amd64.go | 4 ++-- go/arrow/math/float64_sse4_amd64.go | 4 ++-- go/arrow/math/int64_avx2_amd64.go | 4 ++-- go/arrow/math/int64_sse4_amd64.go | 4 ++-- go/arrow/math/type_simd_amd64.go.tmpl | 4 ++-- go/arrow/math/uint64_avx2_amd64.go| 4 ++-- go/arrow/math/uint64_sse4_amd64.go| 4 ++-- 7 files changed, 14 insertions(+), 14 deletions(-)
[arrow] branch master updated (02287b4 -> 8563b42)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 02287b4 ARROW-9078: [C++] Parquet read / write extension type with nested storage type add 8563b42 PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec No new revisions were added by this update. Summary of changes: cpp/src/arrow/util/compression.cc | 15 + cpp/src/arrow/util/compression.h | 13 +++- cpp/src/arrow/util/compression_internal.h | 3 + cpp/src/arrow/util/compression_lz4.cc | 107 ++ cpp/src/arrow/util/compression_test.cc| 70 --- cpp/src/parquet/column_reader.cc | 2 +- cpp/src/parquet/column_writer.cc | 2 +- cpp/src/parquet/column_writer_test.cc | 10 ++- cpp/src/parquet/file_deserialize_test.cc | 8 ++- cpp/src/parquet/file_serialize_test.cc| 15 - cpp/src/parquet/reader_test.cc| 74 - cpp/src/parquet/thrift_internal.h | 5 +- cpp/src/parquet/types.cc | 41 +++- cpp/src/parquet/types.h | 9 --- cpp/submodules/parquet-testing| 2 +- python/pyarrow/tests/test_parquet.py | 16 + 16 files changed, 296 insertions(+), 96 deletions(-)
[arrow] branch master updated: ARROW-9490: [Python][C++] Bug in pa.array when input mixes int8 with float
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 085b44d ARROW-9490: [Python][C++] Bug in pa.array when input mixes int8 with float 085b44d is described below commit 085b44d916cd1266911c05850a2369f30dd1fd65 Author: arw2019 AuthorDate: Sat Aug 22 12:54:05 2020 -0500 ARROW-9490: [Python][C++] Bug in pa.array when input mixes int8 with float Closes #8017 from arw2019/ARROW-9490 Authored-by: arw2019 Signed-off-by: Wes McKinney --- cpp/src/arrow/python/helpers.cc | 2 ++ python/pyarrow/tests/test_convert_builtin.py | 9 - 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/cpp/src/arrow/python/helpers.cc b/cpp/src/arrow/python/helpers.cc index 852bf76..1845aa1 100644 --- a/cpp/src/arrow/python/helpers.cc +++ b/cpp/src/arrow/python/helpers.cc @@ -328,6 +328,8 @@ Status UnboxIntegerAsInt64(PyObject* obj, int64_t* out) { if (overflow) { return Status::Invalid("PyLong is too large to fit int64"); } + } else if (PyArray_IsScalar(obj, Byte)) { +*out = reinterpret_cast(obj)->obval; } else if (PyArray_IsScalar(obj, UByte)) { *out = reinterpret_cast(obj)->obval; } else if (PyArray_IsScalar(obj, Short)) { diff --git a/python/pyarrow/tests/test_convert_builtin.py b/python/pyarrow/tests/test_convert_builtin.py index 788675a..f62a941 100644 --- a/python/pyarrow/tests/test_convert_builtin.py +++ b/python/pyarrow/tests/test_convert_builtin.py @@ -390,10 +390,17 @@ def test_broken_integers(seq): def test_numpy_scalars_mixed_type(): + # ARROW-4324 data = [np.int32(10), np.float32(0.5)] arr = pa.array(data) -expected = pa.array([10, 0.5], type='float64') +expected = pa.array([10, 0.5], type="float64") +assert arr.equals(expected) + +# ARROW-9490 +data = [np.int8(10), np.float32(0.5)] +arr = pa.array(data) +expected = pa.array([10, 0.5], type="float32") assert arr.equals(expected)
[arrow] branch master updated (5d9ccb7 -> 36d267b)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 5d9ccb7 ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows add 36d267b [MINOR] Fix typo and use more concise word in README.md No new revisions were added by this update. Summary of changes: README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
[arrow] branch master updated: ARROW-9528: [Python] Honor tzinfo when converting from datetime
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 2e3d7ec ARROW-9528: [Python] Honor tzinfo when converting from datetime 2e3d7ec is described below commit 2e3d7ecd320d3e91d285ad0ee729aa18e2b4e476 Author: Krisztián Szűcs AuthorDate: Sun Aug 16 15:12:28 2020 -0500 ARROW-9528: [Python] Honor tzinfo when converting from datetime Follow up of: - ARROW-9223: [Python] Propagate timezone information in pandas conversion - ARROW-9528: [Python] Honor tzinfo when converting from datetime (https://github.com/apache/arrow/pull/7805) TODOs: - [x] Store all Timestamp values normalized to UTC - [x] Infer timezone from the array values if no explicit type was given - [x] Testing (especially pandas object roundtrip) - [x] Testing of timezone-naive roundtrips - [x] Testing mixed pandas and datetime objects Closes #7816 from kszucs/tz Lead-authored-by: Krisztián Szűcs Co-authored-by: Micah Kornfield Signed-off-by: Wes McKinney --- ci/scripts/integration_spark.sh| 3 + cpp/src/arrow/compute/kernels/scalar_string.cc | 4 +- cpp/src/arrow/python/arrow_to_pandas.cc| 53 -- cpp/src/arrow/python/arrow_to_pandas.h | 5 +- cpp/src/arrow/python/datetime.cc | 172 +- cpp/src/arrow/python/datetime.h| 26 +++ cpp/src/arrow/python/inference.cc | 22 +-- cpp/src/arrow/python/python_to_arrow.cc| 151 +--- cpp/src/arrow/python/python_to_arrow.h | 8 +- python/pyarrow/array.pxi | 7 +- python/pyarrow/includes/libarrow.pxd | 5 + python/pyarrow/tests/test_array.py | 22 ++- python/pyarrow/tests/test_convert_builtin.py | 234 - python/pyarrow/tests/test_pandas.py| 60 +-- python/pyarrow/tests/test_types.py | 117 + python/pyarrow/types.pxi | 40 + 16 files changed, 747 insertions(+), 182 deletions(-) diff --git a/ci/scripts/integration_spark.sh b/ci/scripts/integration_spark.sh index 9828a28..a45ed7a 100755 --- a/ci/scripts/integration_spark.sh +++ b/ci/scripts/integration_spark.sh @@ -22,6 +22,9 @@ source_dir=${1} spark_dir=${2} spark_version=${SPARK_VERSION:-master} +# Use old behavior that always dropped tiemzones. +export PYARROW_IGNORE_TIMEZONE=1 + if [ "${SPARK_VERSION:0:2}" == "2." ]; then # https://github.com/apache/spark/blob/master/docs/sql-pyspark-pandas-with-arrow.md#compatibility-setting-for-pyarrow--0150-and-spark-23x-24x export ARROW_PRE_0_15_IPC_FORMAT=1 diff --git a/cpp/src/arrow/compute/kernels/scalar_string.cc b/cpp/src/arrow/compute/kernels/scalar_string.cc index 7e61617..0332be9 100644 --- a/cpp/src/arrow/compute/kernels/scalar_string.cc +++ b/cpp/src/arrow/compute/kernels/scalar_string.cc @@ -861,10 +861,10 @@ void AddBinaryLength(FunctionRegistry* registry) { applicator::ScalarUnaryNotNull::Exec; ArrayKernelExec exec_offset_64 = applicator::ScalarUnaryNotNull::Exec; - for (const auto& input_type : {binary(), utf8()}) { + for (const auto input_type : {binary(), utf8()}) { DCHECK_OK(func->AddKernel({input_type}, int32(), exec_offset_32)); } - for (const auto& input_type : {large_binary(), large_utf8()}) { + for (const auto input_type : {large_binary(), large_utf8()}) { DCHECK_OK(func->AddKernel({input_type}, int64(), exec_offset_64)); } DCHECK_OK(registry->AddFunction(std::move(func))); diff --git a/cpp/src/arrow/python/arrow_to_pandas.cc b/cpp/src/arrow/python/arrow_to_pandas.cc index bc4e25b..47b62a3 100644 --- a/cpp/src/arrow/python/arrow_to_pandas.cc +++ b/cpp/src/arrow/python/arrow_to_pandas.cc @@ -17,9 +17,8 @@ // Functions for pandas conversion via NumPy -#include "arrow/python/numpy_interop.h" // IWYU pragma: expand - #include "arrow/python/arrow_to_pandas.h" +#include "arrow/python/numpy_interop.h" // IWYU pragma: expand #include #include @@ -642,15 +641,15 @@ inline Status ConvertStruct(const PandasOptions& options, const ChunkedArray& da std::vector fields_data(num_fields); OwnedRef dict_item; - // XXX(wesm): In ARROW-7723, we found as a result of ARROW-3789 that second + // In ARROW-7723, we found as a result of ARROW-3789 that second // through microsecond resolution tz-aware timestamps were being promoted to // use the DATETIME_NANO_TZ conversion path, yielding a datetime64[ns] NumPy // array in this function. PyArray_GETITEM returns datetime.datetime for // units second through microsecond but PyLong for nanosecond (because - // datetime.datetime does not support nanoseconds). We inserted thi
[arrow] branch master updated: ARROW-9598: [C++][Parquet] Fix writing nullable structs
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 1b0aebe ARROW-9598: [C++][Parquet] Fix writing nullable structs 1b0aebe is described below commit 1b0aebea45bcd6b271324fcfc373e4ccc7543eaa Author: Micah Kornfield AuthorDate: Mon Aug 10 15:33:10 2020 -0500 ARROW-9598: [C++][Parquet] Fix writing nullable structs Traverse the node hierarchy to ensure we capture the right value count. Closes #7862 from emkornfield/verify_parquetfg Authored-by: Micah Kornfield Signed-off-by: Wes McKinney --- cpp/src/parquet/arrow/arrow_reader_writer_test.cc | 17 + cpp/src/parquet/column_writer.cc | 9 ++--- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/cpp/src/parquet/arrow/arrow_reader_writer_test.cc b/cpp/src/parquet/arrow/arrow_reader_writer_test.cc index 661ce7b..476d82f 100644 --- a/cpp/src/parquet/arrow/arrow_reader_writer_test.cc +++ b/cpp/src/parquet/arrow/arrow_reader_writer_test.cc @@ -2344,6 +2344,23 @@ TEST(ArrowReadWrite, SimpleStructRoundTrip) { 2); } +TEST(ArrowReadWrite, SingleColumnNullableStruct) { + auto links = + field("Links", +::arrow::struct_({field("Backward", ::arrow::int64(), /*nullable=*/true)})); + + auto links_id_array = ::arrow::ArrayFromJSON(links->type(), + "[null, " + "{\"Backward\": 10}" + "]"); + + CheckSimpleRoundtrip( + ::arrow::Table::Make(std::make_shared<::arrow::Schema>( + std::vector>{links}), + {links_id_array}), + 3); +} + // Disabled until implementation can be finished. TEST(TestArrowReadWrite, DISABLED_CanonicalNestedRoundTrip) { auto doc_id = field("DocId", ::arrow::int64(), /*nullable=*/false); diff --git a/cpp/src/parquet/column_writer.cc b/cpp/src/parquet/column_writer.cc index f9cf37c..6cb0bae 100644 --- a/cpp/src/parquet/column_writer.cc +++ b/cpp/src/parquet/column_writer.cc @@ -1138,8 +1138,12 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter< if (descr_->max_definition_level() > 0) { // Minimal definition level for which spaced values are written int16_t min_spaced_def_level = descr_->max_definition_level(); - if (descr_->schema_node()->is_optional()) { -min_spaced_def_level--; + const ::parquet::schema::Node* node = descr_->schema_node().get(); + while (node != nullptr && !node->is_repeated()) { +if (node->is_optional()) { + min_spaced_def_level--; +} +node = node->parent(); } for (int64_t i = 0; i < num_levels; ++i) { if (def_levels[i] == descr_->max_definition_level()) { @@ -1149,7 +1153,6 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter< ++spaced_values_to_write; } } - WriteDefinitionLevels(num_levels, def_levels); } else { // Required field, write all values
[arrow] branch master updated (4489cb7 -> 9c04867)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 4489cb7 ARROW-9462:[Go] The Indentation after the first Record in arrjson writer is incorrect add 9c04867 ARROW-9643: [C++] Only register the SIMD variants when it's supported. No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/kernels/aggregate_basic.cc | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-)
[arrow-site] branch master updated: Adjust positioning of badges (#70)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new 4632363 Adjust positioning of badges (#70) 4632363 is described below commit 4632363bfae07650817030ca554d311875b97440 Author: Neal Richardson AuthorDate: Tue Aug 4 14:58:20 2020 -0700 Adjust positioning of badges (#70) --- _layouts/home.html | 11 +-- css/main.scss | 13 + 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/_layouts/home.html b/_layouts/home.html index f6f49ea..fe074f9 100644 --- a/_layouts/home.html +++ b/_layouts/home.html @@ -8,8 +8,15 @@ A cross-language development platform for in-memory analytics - - https://github.com/apache/arrow; data-size="large" data-show-count="true" aria-label="Star apache/arrow on GitHub">Star https://twitter.com/ApacheArrow?ref_src=twsrc%5Etfw; class="twitter-follow-button" data-show-count="true">Follow @ApacheArrowhttps://platform.twitter.com/widgets.js"</a>; charset="utf-8"> + + + + https://github.com/apache/arrow; data-size="large" data-show-count="true" aria-label="Star apache/arrow on GitHub">Star + + + https://twitter.com/ApacheArrow?ref_src=twsrc%5Etfw; class="twitter-follow-button" data-show-count="true">Follow @ApacheArrowhttps://platform.twitter.com/widgets.js"</a>; charset="utf-8"> + + diff --git a/css/main.scss b/css/main.scss index e844dfb..a4cdb90 100644 --- a/css/main.scss +++ b/css/main.scss @@ -97,3 +97,16 @@ p code, li code { p a code { color: inherit; } + +.social-badges iframe { + vertical-align: middle; +} + +.social-badges span { + vertical-align: top; +} + +.social-badge { + display: inline; + padding: 12px; +}
[arrow] branch master updated (0d25270 -> 50d6252)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 0d25270 PARQUET-1845: [C++] Add expected results of Int96 in big-endian add 50d6252 ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index No new revisions were added by this update. Summary of changes: python/pyarrow/pandas_compat.py | 21 +++-- python/pyarrow/tests/test_pandas.py | 30 +- 2 files changed, 32 insertions(+), 19 deletions(-)
[arrow] branch master updated: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 0d25270 PARQUET-1845: [C++] Add expected results of Int96 in big-endian 0d25270 is described below commit 0d25270703fcc1db95104d6b77ae6d1286c36977 Author: Kazuaki Ishizaki AuthorDate: Mon Aug 3 11:46:18 2020 -0500 PARQUET-1845: [C++] Add expected results of Int96 in big-endian This PR adds expected results of Int96 for parquet-internals-test in big-endian. This PR assumes that uint_64 and uint_32 elements in Int96 are handled using a native endian for effectiveness. Closes #6981 from kiszk/PARQUET-1845 Authored-by: Kazuaki Ishizaki Signed-off-by: Wes McKinney --- cpp/src/parquet/types_test.cc | 10 ++ 1 file changed, 10 insertions(+) diff --git a/cpp/src/parquet/types_test.cc b/cpp/src/parquet/types_test.cc index ccec95f..a14308f 100644 --- a/cpp/src/parquet/types_test.cc +++ b/cpp/src/parquet/types_test.cc @@ -102,8 +102,13 @@ TEST(TypePrinter, StatisticsTypes) { ASSERT_STREQ("1.0245", FormatStatValue(Type::DOUBLE, smin).c_str()); ASSERT_STREQ("2.0489", FormatStatValue(Type::DOUBLE, smax).c_str()); +#if ARROW_LITTLE_ENDIAN Int96 Int96_min = {{1024, 2048, 4096}}; Int96 Int96_max = {{2048, 4096, 8192}}; +#else + Int96 Int96_min = {{2048, 1024, 4096}}; + Int96 Int96_max = {{4096, 2048, 8192}}; +#endif smin = std::string(reinterpret_cast(_min), sizeof(Int96)); smax = std::string(reinterpret_cast(_max), sizeof(Int96)); ASSERT_STREQ("1024 2048 4096", FormatStatValue(Type::INT96, smin).c_str()); @@ -126,9 +131,14 @@ TEST(TypePrinter, StatisticsTypes) { TEST(TestInt96Timestamp, Decoding) { auto check = [](int32_t julian_day, uint64_t nanoseconds) { +#if ARROW_LITTLE_ENDIAN Int96 i96{static_cast(nanoseconds), static_cast(nanoseconds >> 32), static_cast(julian_day)}; +#else +Int96 i96{static_cast(nanoseconds >> 32), + static_cast(nanoseconds), static_cast(julian_day)}; +#endif // Official formula according to https://github.com/apache/parquet-format/pull/49 int64_t expected = (julian_day - 2440588) * (86400LL * 1000 * 1000 * 1000) + nanoseconds;
[arrow-site] 01/01: Add GitHub star and Twitter follow buttons
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch follow-buttons in repository https://gitbox.apache.org/repos/asf/arrow-site.git commit c1d35383d0272f4015c03e3011cbf7a82f81e8aa Author: Wes McKinney AuthorDate: Sun Aug 2 13:30:21 2020 -0500 Add GitHub star and Twitter follow buttons --- _layouts/home.html | 4 1 file changed, 4 insertions(+) diff --git a/_layouts/home.html b/_layouts/home.html index c58651f..f6f49ea 100644 --- a/_layouts/home.html +++ b/_layouts/home.html @@ -8,6 +8,8 @@ A cross-language development platform for in-memory analytics + + https://github.com/apache/arrow; data-size="large" data-show-count="true" aria-label="Star apache/arrow on GitHub">Star https://twitter.com/ApacheArrow?ref_src=twsrc%5Etfw; class="twitter-follow-button" data-show-count="true">Follow @ApacheArrowhttps://platform.twitter.com/widgets.js"</a>; charset="utf-8"> @@ -17,5 +19,7 @@ {% include footer.html %} + +https://buttons.github.io/buttons.js"</a>;>
[arrow-site] branch follow-buttons created (now c1d3538)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch follow-buttons in repository https://gitbox.apache.org/repos/asf/arrow-site.git. at c1d3538 Add GitHub star and Twitter follow buttons This branch includes the following new commits: new c1d3538 Add GitHub star and Twitter follow buttons The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow] branch master updated: ARROW-9398: [C++] Register SIMD sum variants to function instance.
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 6efba62 ARROW-9398: [C++] Register SIMD sum variants to function instance. 6efba62 is described below commit 6efba62ee47196e62e3521b07d4c25c092e8910e Author: Frank Du AuthorDate: Thu Jul 30 18:09:06 2020 -0500 ARROW-9398: [C++] Register SIMD sum variants to function instance. Enable simd_level feature of kernel and use it in DispatchExactImpl. Add simd_level as a parameter of sum template to make sure every simd kernel has its own instantiation instance. Also expand sum/mean test case to cover BitBlockCounter method. Signed-off-by: Frank Du Closes #7700 from jianxind/sum_variants_to_function Authored-by: Frank Du Signed-off-by: Wes McKinney --- cpp/src/arrow/compute/function.cc | 25 +- cpp/src/arrow/compute/kernel.h | 9 +++-- cpp/src/arrow/compute/kernels/aggregate_basic.cc | 40 -- .../compute/kernels/aggregate_basic_internal.h | 30 ++-- .../arrow/compute/kernels/aggregate_sum_avx2.cc| 39 - .../arrow/compute/kernels/aggregate_sum_avx512.cc | 40 -- cpp/src/arrow/compute/kernels/aggregate_test.cc| 8 +++-- cpp/src/arrow/compute/registry.cc | 14 cpp/src/arrow/compute/registry_internal.h | 3 -- 9 files changed, 110 insertions(+), 98 deletions(-) diff --git a/cpp/src/arrow/compute/function.cc b/cpp/src/arrow/compute/function.cc index 1bce468..41c3e36 100644 --- a/cpp/src/arrow/compute/function.cc +++ b/cpp/src/arrow/compute/function.cc @@ -24,6 +24,7 @@ #include "arrow/compute/exec.h" #include "arrow/compute/exec_internal.h" #include "arrow/datum.h" +#include "arrow/util/cpu_info.h" namespace arrow { namespace compute { @@ -58,6 +59,7 @@ Result DispatchExactImpl(const Function& func, const std::vector& kernels, const std::vector& values) { const int passed_num_args = static_cast(values.size()); + const KernelType* kernel_matches[SimdLevel::MAX] = {NULL}; // Validate arity const Arity arity = func.arity(); @@ -70,9 +72,30 @@ Result DispatchExactImpl(const Function& func, } for (const auto& kernel : kernels) { if (kernel.signature->MatchesInputs(values)) { - return + kernel_matches[kernel.simd_level] = } } + + // Dispatch as the CPU feature + auto cpu_info = arrow::internal::CpuInfo::GetInstance(); +#if defined(ARROW_HAVE_RUNTIME_AVX512) + if (cpu_info->IsSupported(arrow::internal::CpuInfo::AVX512)) { +if (kernel_matches[SimdLevel::AVX512]) { + return kernel_matches[SimdLevel::AVX512]; +} + } +#endif +#if defined(ARROW_HAVE_RUNTIME_AVX2) + if (cpu_info->IsSupported(arrow::internal::CpuInfo::AVX2)) { +if (kernel_matches[SimdLevel::AVX2]) { + return kernel_matches[SimdLevel::AVX2]; +} + } +#endif + if (kernel_matches[SimdLevel::NONE]) { +return kernel_matches[SimdLevel::NONE]; + } + return Status::NotImplemented("Function ", func.name(), " has no kernel matching input types ", FormatArgTypes(values)); diff --git a/cpp/src/arrow/compute/kernel.h b/cpp/src/arrow/compute/kernel.h index c581544..3fb6947 100644 --- a/cpp/src/arrow/compute/kernel.h +++ b/cpp/src/arrow/compute/kernel.h @@ -448,7 +448,7 @@ class ARROW_EXPORT KernelSignature { /// type combination for different SIMD levels. Based on the active system's /// CPU info or the user's preferences, we can elect to use one over the other. struct SimdLevel { - enum type { NONE, SSE4_2, AVX, AVX2, AVX512, NEON }; + enum type { NONE = 0, SSE4_2, AVX, AVX2, AVX512, NEON, MAX }; }; /// \brief The strategy to use for propagating or otherwise populating the @@ -555,10 +555,9 @@ struct Kernel { bool parallelizable = true; /// \brief Indicates the level of SIMD instruction support in the host CPU is - /// required to use the function. Currently this is not used, but the - /// intention is for functions to be able to contain multiple kernels with - /// the same signature but different levels of SIMD, so that the most - /// optimized kernel supported on a host's processor can be chosen. + /// required to use the function. The intention is for functions to be able to + /// contain multiple kernels with the same signature but different levels of SIMD, + /// so that the most optimized kernel supported on a host's processor can be chosen. SimdLevel::type simd_level = SimdLevel::NONE; }; diff --git a/cpp/src/arrow/compute/ker
[arrow] branch master updated (564366c -> fad0b94)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 564366c ARROW-9589: [C++/R] Forward declare structs as structs add fad0b94 ARROW-9585: [Rust][DataFusion] Remove duplicated to-do line No new revisions were added by this update. Summary of changes: rust/datafusion/README.md | 1 - 1 file changed, 1 deletion(-)
[arrow-testing] branch master updated: ARROW-8797: Add golden files to support ipc between different endians (#41)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-testing.git The following commit(s) were added to refs/heads/master by this push: new 0e56bdd ARROW-8797: Add golden files to support ipc between different endians (#41) 0e56bdd is described below commit 0e56bdd4fc887f26fdf018c746b24f09f16e2a08 Author: Kazuaki Ishizaki AuthorDate: Wed Jul 29 03:41:23 2020 +0900 ARROW-8797: Add golden files to support ipc between different endians (#41) * add golden files * address review comment --- .../generated_custom_metadata.arrow_file | Bin 0 -> 2682 bytes .../generated_custom_metadata.json.gz| Bin 0 -> 598 bytes .../1.0.0-bigendian/generated_custom_metadata.stream | Bin 0 -> 1520 bytes .../1.0.0-bigendian/generated_datetime.arrow_file| Bin 0 -> 5498 bytes .../1.0.0-bigendian/generated_datetime.json.gz | Bin 0 -> 2738 bytes .../1.0.0-bigendian/generated_datetime.stream| Bin 0 -> 4576 bytes .../1.0.0-bigendian/generated_decimal.arrow_file | Bin 0 -> 256642 bytes .../1.0.0-bigendian/generated_decimal.json.gz| Bin 0 -> 159351 bytes .../1.0.0-bigendian/generated_decimal.stream | Bin 0 -> 253920 bytes .../1.0.0-bigendian/generated_dictionary.arrow_file | Bin 0 -> 2642 bytes .../1.0.0-bigendian/generated_dictionary.json.gz | Bin 0 -> 1166 bytes .../1.0.0-bigendian/generated_dictionary.stream | Bin 0 -> 2136 bytes .../generated_dictionary_unsigned.arrow_file | Bin 0 -> 2178 bytes .../generated_dictionary_unsigned.json.gz| Bin 0 -> 693 bytes .../generated_dictionary_unsigned.stream | Bin 0 -> 1704 bytes .../generated_duplicate_fieldnames.arrow_file| Bin 0 -> 1130 bytes .../generated_duplicate_fieldnames.json.gz | Bin 0 -> 415 bytes .../generated_duplicate_fieldnames.stream| Bin 0 -> 736 bytes .../1.0.0-bigendian/generated_extension.arrow_file | Bin 0 -> 2050 bytes .../1.0.0-bigendian/generated_extension.json.gz | Bin 0 -> 918 bytes .../1.0.0-bigendian/generated_extension.stream | Bin 0 -> 1400 bytes .../1.0.0-bigendian/generated_interval.arrow_file| Bin 0 -> 2418 bytes .../1.0.0-bigendian/generated_interval.json.gz | Bin 0 -> 1506 bytes .../1.0.0-bigendian/generated_interval.stream| Bin 0 -> 1984 bytes .../1.0.0-bigendian/generated_large_batch.arrow_file | Bin 0 -> 9838418 bytes .../1.0.0-bigendian/generated_large_batch.json.gz| Bin 0 -> 11050357 bytes .../1.0.0-bigendian/generated_large_batch.stream | Bin 0 -> 9836424 bytes .../1.0.0-bigendian/generated_map.arrow_file | Bin 0 -> 1642 bytes .../1.0.0-bigendian/generated_map.json.gz| Bin 0 -> 835 bytes .../integration/1.0.0-bigendian/generated_map.stream | Bin 0 -> 1256 bytes .../generated_map_non_canonical.arrow_file | Bin 0 -> 1242 bytes .../generated_map_non_canonical.json.gz | Bin 0 -> 718 bytes .../generated_map_non_canonical.stream | Bin 0 -> 840 bytes .../1.0.0-bigendian/generated_nested.arrow_file | Bin 0 -> 2714 bytes .../1.0.0-bigendian/generated_nested.json.gz | Bin 0 -> 1622 bytes .../1.0.0-bigendian/generated_nested.stream | Bin 0 -> 2168 bytes .../generated_nested_dictionary.arrow_file | Bin 0 -> 3362 bytes .../generated_nested_dictionary.json.gz | Bin 0 -> 1149 bytes .../generated_nested_dictionary.stream | Bin 0 -> 2632 bytes .../generated_nested_large_offsets.arrow_file| Bin 0 -> 2602 bytes .../generated_nested_large_offsets.json.gz | Bin 0 -> 1105 bytes .../generated_nested_large_offsets.stream| Bin 0 -> 2032 bytes .../1.0.0-bigendian/generated_null.arrow_file| Bin 0 -> 1322 bytes .../1.0.0-bigendian/generated_null.json.gz | Bin 0 -> 502 bytes .../1.0.0-bigendian/generated_null.stream| Bin 0 -> 920 bytes .../generated_null_trivial.arrow_file| Bin 0 -> 530 bytes .../1.0.0-bigendian/generated_null_trivial.json.gz | Bin 0 -> 192 bytes .../1.0.0-bigendian/generated_null_trivial.stream| Bin 0 -> 320 bytes .../1.0.0-bigendian/generated_primitive.arrow_file | Bin 0 -> 22306 bytes .../1.0.0-bigendian/generated_primitive.json.gz | Bin 0 -> 19362 bytes .../1.0.0-bigendian/generated_primitive.stream | Bin 0 -> 20288 bytes .../generated_primitive_large_offsets.arrow_file | Bin 0 -> 3586 bytes .../generated_primitive_large_offsets.json.gz| Bin 0 -> 1702 bytes .../generated_primitive_large_offsets.stream | Bin 0 -> 3160 bytes .../generated_primitive_no_batches.arrow_file| Bin 0 -&
[arrow] branch master updated: ARROW-9512: [C++] Avoid variadic template unpack inside lambda to work around gcc 4.8 bug
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 8a8d7ce ARROW-9512: [C++] Avoid variadic template unpack inside lambda to work around gcc 4.8 bug 8a8d7ce is described below commit 8a8d7ce39793ed8cafb2318c2752f027c75a17e6 Author: Wes McKinney AuthorDate: Sun Jul 19 12:25:20 2020 -0500 ARROW-9512: [C++] Avoid variadic template unpack inside lambda to work around gcc 4.8 bug This works around a gcc bug. This only affects compilation of unit tests on gcc 4.8 so not an issue for the 1.0.0 RC1 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47226 Closes #7794 from wesm/ARROW-9512 Authored-by: Wes McKinney Signed-off-by: Wes McKinney --- cpp/src/arrow/testing/gtest_util.cc | 24 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/cpp/src/arrow/testing/gtest_util.cc b/cpp/src/arrow/testing/gtest_util.cc index de5b87a..b2f5566 100644 --- a/cpp/src/arrow/testing/gtest_util.cc +++ b/cpp/src/arrow/testing/gtest_util.cc @@ -106,20 +106,6 @@ void AssertTsSame(const T& expected, const T& actual, CompareFunctor&& compare) } } -template -void AssertTsEqual(const T& expected, const T& actual, ExtraArgs... args) { - return AssertTsSame(expected, actual, [&](const T& expected, const T& actual) { -return expected.Equals(actual, args...); - }); -} - -template -void AssertTsApproxEqual(const T& expected, const T& actual) { - return AssertTsSame(expected, actual, [](const T& expected, const T& actual) { -return expected.ApproxEquals(actual); - }); -} - template void AssertArraysEqualWith(const Array& expected, const Array& actual, bool verbose, CompareFunctor&& compare) { @@ -175,11 +161,17 @@ void AssertScalarsEqual(const Scalar& expected, const Scalar& actual, bool verbo void AssertBatchesEqual(const RecordBatch& expected, const RecordBatch& actual, bool check_metadata) { - AssertTsEqual(expected, actual, check_metadata); + AssertTsSame(expected, actual, + [&](const RecordBatch& expected, const RecordBatch& actual) { + return expected.Equals(actual, check_metadata); + }); } void AssertBatchesApproxEqual(const RecordBatch& expected, const RecordBatch& actual) { - AssertTsApproxEqual(expected, actual); + AssertTsSame(expected, actual, + [&](const RecordBatch& expected, const RecordBatch& actual) { + return expected.ApproxEquals(actual); + }); } void AssertChunkedEqual(const ChunkedArray& expected, const ChunkedArray& actual) {
[arrow] branch master updated (1fcbc6d -> 954547a)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 1fcbc6d ARROW-9478: [C++] Improve error message for unsupported casts add 954547a ARROW-9499: [C++] AdaptiveIntBuilder::AppendNull does not increment the null count No new revisions were added by this update. Summary of changes: cpp/src/arrow/array/array_test.cc | 12 cpp/src/arrow/array/builder_adaptive.h | 1 + 2 files changed, 13 insertions(+)
[arrow-testing] branch master updated: ARROW-9497: [C++][Parquet] Add oss-fuzz test case
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-testing.git The following commit(s) were added to refs/heads/master by this push: new f552c4d ARROW-9497: [C++][Parquet] Add oss-fuzz test case f552c4d is described below commit f552c4dcd2ae3d14048abd20919748cce5276ade Author: Wes McKinney AuthorDate: Wed Jul 15 19:13:00 2020 -0500 ARROW-9497: [C++][Parquet] Add oss-fuzz test case --- ...testcase-minimized-parquet-arrow-fuzz-5747849626386432 | Bin 0 -> 213 bytes 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/data/parquet/fuzzing/clusterfuzz-testcase-minimized-parquet-arrow-fuzz-5747849626386432 b/data/parquet/fuzzing/clusterfuzz-testcase-minimized-parquet-arrow-fuzz-5747849626386432 new file mode 100644 index 000..67697be Binary files /dev/null and b/data/parquet/fuzzing/clusterfuzz-testcase-minimized-parquet-arrow-fuzz-5747849626386432 differ
[arrow] branch master updated (842d513 -> be84d7b)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 842d513 ARROW-9476: [C++][Dataset] Fix incorrect dictionary association in HivePartitioningFactory add be84d7b ARROW-9486: [C++][Dataset] Support implicit cast of InExpression::set to dict No new revisions were added by this update. Summary of changes: cpp/src/arrow/dataset/filter.cc | 21 +++-- cpp/src/arrow/dataset/filter_test.cc | 10 +++--- 2 files changed, 26 insertions(+), 5 deletions(-)
[arrow] branch master updated (a88635a -> 399c034)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a88635a ARROW-9485: [R] Better shared library stripping add 399c034 ARROW-9484: [Docs] Update is* functions to be is_* in the compute docs No new revisions were added by this update. Summary of changes: .../compute/kernels/scalar_string_benchmark.cc | 4 +-- docs/source/cpp/compute.rst| 42 +++--- r/README.md| 6 3 files changed, 23 insertions(+), 29 deletions(-)
[arrow] branch master updated: ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 3586292 ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec 3586292 is described below commit 3586292d62c8c348e9fb85676eb524cde53179cf Author: Wes McKinney AuthorDate: Tue Jul 14 21:39:47 2020 -0500 ARROW-9424: [C++][Parquet] Disable writing files with LZ4 codec Due to ongoing LZ4 problems with Parquet files, this patch disables writing files with LZ4 codec by throwing a `ParquetException`. In progress: adding exceptions for pyarrow when using LZ4 to write files and updating relevant pytests Mailing list discussion: https://mail-archives.apache.org/mod_mbox/arrow-dev/202007.mbox/%3CCAJPUwMCM4ZaJB720%2BuoM1aSA2oD9jSEnzuwWjJiw6vwXxHk7nw%40mail.gmail.com%3E Jira ticket: https://issues.apache.org/jira/browse/ARROW-9424 Closes #7757 from patrickpai/ARROW-9424 Lead-authored-by: Wes McKinney Co-authored-by: Patrick Pai Signed-off-by: Wes McKinney --- cpp/src/parquet/column_reader.cc | 2 +- cpp/src/parquet/column_writer.cc | 2 +- cpp/src/parquet/column_writer_test.cc| 10 ++ cpp/src/parquet/file_deserialize_test.cc | 5 ++--- cpp/src/parquet/file_serialize_test.cc | 2 +- cpp/src/parquet/thrift_internal.h| 1 + cpp/src/parquet/types.cc | 33 cpp/src/parquet/types.h | 9 + python/pyarrow/tests/test_parquet.py | 16 ++-- 9 files changed, 64 insertions(+), 16 deletions(-) diff --git a/cpp/src/parquet/column_reader.cc b/cpp/src/parquet/column_reader.cc index 0bfc303..bc462ad 100644 --- a/cpp/src/parquet/column_reader.cc +++ b/cpp/src/parquet/column_reader.cc @@ -182,7 +182,7 @@ class SerializedPageReader : public PageReader { InitDecryption(); } max_page_header_size_ = kDefaultMaxPageHeaderSize; -decompressor_ = GetCodec(codec); +decompressor_ = internal::GetReadCodec(codec); } // Implement the PageReader interface diff --git a/cpp/src/parquet/column_writer.cc b/cpp/src/parquet/column_writer.cc index 13f91e3..f9cf37c 100644 --- a/cpp/src/parquet/column_writer.cc +++ b/cpp/src/parquet/column_writer.cc @@ -172,7 +172,7 @@ class SerializedPageWriter : public PageWriter { if (data_encryptor_ != nullptr || meta_encryptor_ != nullptr) { InitEncryption(); } -compressor_ = GetCodec(codec, compression_level); +compressor_ = internal::GetWriteCodec(codec, compression_level); thrift_serializer_.reset(new ThriftSerializer); } diff --git a/cpp/src/parquet/column_writer_test.cc b/cpp/src/parquet/column_writer_test.cc index 23554aa..a92d4d2 100644 --- a/cpp/src/parquet/column_writer_test.cc +++ b/cpp/src/parquet/column_writer_test.cc @@ -488,13 +488,15 @@ TYPED_TEST(TestPrimitiveWriter, RequiredPlainWithStatsAndGzipCompression) { #ifdef ARROW_WITH_LZ4 TYPED_TEST(TestPrimitiveWriter, RequiredPlainWithLz4Compression) { - this->TestRequiredWithSettings(Encoding::PLAIN, Compression::LZ4, false, false, - LARGE_SIZE); + ASSERT_THROW(this->TestRequiredWithSettings(Encoding::PLAIN, Compression::LZ4, false, + false, LARGE_SIZE), + ParquetException); } TYPED_TEST(TestPrimitiveWriter, RequiredPlainWithStatsAndLz4Compression) { - this->TestRequiredWithSettings(Encoding::PLAIN, Compression::LZ4, false, true, - LARGE_SIZE); + ASSERT_THROW(this->TestRequiredWithSettings(Encoding::PLAIN, Compression::LZ4, false, + true, LARGE_SIZE), + ParquetException); } #endif diff --git a/cpp/src/parquet/file_deserialize_test.cc b/cpp/src/parquet/file_deserialize_test.cc index 3fe2230..1dd3492 100644 --- a/cpp/src/parquet/file_deserialize_test.cc +++ b/cpp/src/parquet/file_deserialize_test.cc @@ -249,9 +249,8 @@ TEST_F(TestPageSerde, Compression) { codec_types.push_back(Compression::GZIP); #endif -#ifdef ARROW_WITH_LZ4 - codec_types.push_back(Compression::LZ4); -#endif + // TODO: Add LZ4 compression type after PARQUET-1878 is complete. + // Testing for deserializing LZ4 is hard without writing enabled, so it is not included. #ifdef ARROW_WITH_ZSTD codec_types.push_back(Compression::ZSTD); diff --git a/cpp/src/parquet/file_serialize_test.cc b/cpp/src/parquet/file_serialize_test.cc index c5c4df2..72d7d6f 100644 --- a/cpp/src/parquet/file_serialize_test.cc +++ b/cpp/src/parquet/file_serialize_test.cc @@ -309,7 +309,7 @@ TYPED_TEST(TestSerialize, SmallFileGzip) { #ifdef ARROW_WITH_LZ4 TYPED_TEST(TestSerialize, SmallFileLz4) { - ASSERT_NO_FATAL_FAILURE(this->FileSerializeTest(Compression::LZ4));
[arrow] branch master updated (075e4dd -> a0b7f2a)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 075e4dd ARROW-9452: [Rust] [DataFusion] Optimize ParquetScanExec add a0b7f2a ARROW-9399: [C++] Add forward compatibility test to detect and raise error for future MetadataVersion No new revisions were added by this update. Summary of changes: cpp/src/arrow/flight/test_util.cc| 11 +-- cpp/src/arrow/ipc/message.cc | 5 + cpp/src/arrow/ipc/read_write_test.cc | 20 cpp/src/arrow/testing/util.cc| 10 ++ cpp/src/arrow/testing/util.h | 4 testing | 2 +- 6 files changed, 41 insertions(+), 11 deletions(-)
[arrow] branch master updated (6a3f9eb -> 075e4dd)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6a3f9eb ARROW-9473: [Doc] Polishing for 1.0 add 075e4dd ARROW-9452: [Rust] [DataFusion] Optimize ParquetScanExec No new revisions were added by this update. Summary of changes: .../src/execution/physical_plan/parquet.rs | 57 +- 1 file changed, 24 insertions(+), 33 deletions(-)
[arrow] branch master updated (3fc83c2 -> f131fe6)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 3fc83c2 ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes add f131fe6 ARROW-9390: [C++][Followup] Add underscores to is* string functions No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/kernels/scalar_string.cc | 44 ++--- .../arrow/compute/kernels/scalar_string_test.cc| 77 +++--- python/pyarrow/compute.py | 40 +-- python/pyarrow/tests/test_compute.py | 29 4 files changed, 97 insertions(+), 93 deletions(-)
[arrow-testing] branch master updated: ARROW-9399: [C++] Check in serialized schema with MetadataVersion::V6
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-testing.git The following commit(s) were added to refs/heads/master by this push: new 41209ab ARROW-9399: [C++] Check in serialized schema with MetadataVersion::V6 41209ab is described below commit 41209ab1ead9fa8438cc41da4640354799627549 Author: Wes McKinney AuthorDate: Tue Jul 14 16:25:31 2020 -0500 ARROW-9399: [C++] Check in serialized schema with MetadataVersion::V6 --- data/forward-compatibility/README.md | 27 +++ data/forward-compatibility/schema_v6.arrow | Bin 0 -> 120 bytes 2 files changed, 27 insertions(+) diff --git a/data/forward-compatibility/README.md b/data/forward-compatibility/README.md new file mode 100644 index 000..f011f2f --- /dev/null +++ b/data/forward-compatibility/README.md @@ -0,0 +1,27 @@ + + +# Forward compatibility testing files + +This folder contains files to help with verifying that current Arrow libraries +reject Flatbuffers protocol additions "from the future" (like new data types, +new features, new metadata versions, etc.). + +* schema_v6.arrow: a serialized Schema using a currently non-existent + MetadataVersion::V6 \ No newline at end of file diff --git a/data/forward-compatibility/schema_v6.arrow b/data/forward-compatibility/schema_v6.arrow new file mode 100644 index 000..a2cd1ae Binary files /dev/null and b/data/forward-compatibility/schema_v6.arrow differ
[arrow] branch master updated: ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 3fc83c2 ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes 3fc83c2 is described below commit 3fc83c281104fff0bf8e07e7589281186c7ed251 Author: Bryan Cutler AuthorDate: Tue Jul 14 16:04:32 2020 -0500 ARROW-9438: [CI] Add spark patch to compile with recent Arrow Java changes Recent changes in Arrow Java from ARROW-9300 now require adding a dependency on arrow-memory-netty to provide a default allocator. This adds a patch to build spark with the required dependency. Closes #7746 from BryanCutler/spark-integration-patch-ARROW-9438 Lead-authored-by: Bryan Cutler Co-authored-by: Krisztián Szűcs Signed-off-by: Wes McKinney --- ci/docker/conda-python-spark.dockerfile | 4 ++ ci/etc/integration_spark_ARROW-9438.patch | 72 +++ dev/release/rat_exclude_files.txt | 1 + 3 files changed, 77 insertions(+) diff --git a/ci/docker/conda-python-spark.dockerfile b/ci/docker/conda-python-spark.dockerfile index d3f0a22..a20f1ff 100644 --- a/ci/docker/conda-python-spark.dockerfile +++ b/ci/docker/conda-python-spark.dockerfile @@ -36,6 +36,10 @@ ARG spark=master COPY ci/scripts/install_spark.sh /arrow/ci/scripts/ RUN /arrow/ci/scripts/install_spark.sh ${spark} /spark +# patch spark to build with current Arrow Java +COPY ci/etc/integration_spark_ARROW-9438.patch /arrow/ci/etc/ +RUN patch -d /spark -p1 -i /arrow/ci/etc/integration_spark_ARROW-9438.patch + # build cpp with tests ENV CC=gcc \ CXX=g++ \ diff --git a/ci/etc/integration_spark_ARROW-9438.patch b/ci/etc/integration_spark_ARROW-9438.patch new file mode 100644 index 000..2baed30 --- /dev/null +++ b/ci/etc/integration_spark_ARROW-9438.patch @@ -0,0 +1,72 @@ +From 0b5388a945a7e5c5706cf00d0754540a6c68254d Mon Sep 17 00:00:00 2001 +From: Bryan Cutler +Date: Mon, 13 Jul 2020 23:12:25 -0700 +Subject: [PATCH] Update Arrow Java for 1.0.0 + +--- + pom.xml | 17 ++--- + sql/catalyst/pom.xml | 4 + 2 files changed, 18 insertions(+), 3 deletions(-) + +diff --git a/pom.xml b/pom.xml +index 08ca13bfe9..6619fca200 100644 +--- a/pom.xml b/pom.xml +@@ -199,7 +199,7 @@ + If you are changing Arrow version specification, please check ./python/pyspark/sql/utils.py, + and ./python/setup.py too. + --> +-0.15.1 ++1.0.0-SNAPSHOT + + org.fusesource.leveldbjni + +@@ -2288,7 +2288,7 @@ + + + com.fasterxml.jackson.core +-jackson-databind ++jackson-core + + + io.netty +@@ -2298,9 +2298,20 @@ + io.netty + netty-common + ++ ++ ++ ++org.apache.arrow ++arrow-memory-netty ++${arrow.version} ++ + + io.netty +-netty-handler ++netty-buffer ++ ++ ++io.netty ++netty-common + + + +diff --git a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml +index 9edbb7fec9..6b79eb722f 100644 +--- a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml +@@ -117,6 +117,10 @@ + org.apache.arrow + arrow-vector + ++ ++ org.apache.arrow ++ arrow-memory-netty ++ + + + target/scala-${scala.binary.version}/classes +-- +2.17.1 + diff --git a/dev/release/rat_exclude_files.txt b/dev/release/rat_exclude_files.txt index d25e2e3..158790d 100644 --- a/dev/release/rat_exclude_files.txt +++ b/dev/release/rat_exclude_files.txt @@ -9,6 +9,7 @@ *.snap .github/ISSUE_TEMPLATE/question.md ci/etc/rprofile +ci/etc/*.patch cpp/CHANGELOG_PARQUET.md cpp/src/arrow/io/mman.h cpp/src/arrow/util/random.h
[arrow] branch master updated (e771b94 -> 1413963)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from e771b94 ARROW-8480: [Rust] Use NonNull well aligned pointer as Unique reference add 1413963 ARROW-8314: [Python] Add a Table.select method to select a subset of columns No new revisions were added by this update. Summary of changes: cpp/src/arrow/table.cc | 20 cpp/src/arrow/table.h| 3 ++ cpp/src/arrow/table_test.cc | 16 + python/pyarrow/feather.py| 5 +-- python/pyarrow/includes/libarrow.pxd | 1 + python/pyarrow/table.pxi | 63 python/pyarrow/tests/test_dataset.py | 4 +-- python/pyarrow/tests/test_table.py | 51 + 8 files changed, 144 insertions(+), 19 deletions(-)
[arrow] branch master updated (17a0e47 -> e771b94)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 17a0e47 ARROW-9449: [R] Strip arrow.so add e771b94 ARROW-8480: [Rust] Use NonNull well aligned pointer as Unique reference No new revisions were added by this update. Summary of changes: rust/arrow/src/buffer.rs | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-)
[arrow] branch master updated (4d9d66f -> cd6bd82)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 4d9d66f ARROW-9458: [Python] Release GIL in ScanTask.execute add cd6bd82 ARROW-9447 [Rust][DataFusion] Made ScalarUDF (Send + Sync) No new revisions were added by this update. Summary of changes: rust/datafusion/src/execution/context.rs| 4 ++-- rust/datafusion/src/execution/physical_plan/math_expressions.rs | 4 ++-- rust/datafusion/src/execution/physical_plan/udf.rs | 2 +- rust/datafusion/tests/sql.rs| 2 +- 4 files changed, 6 insertions(+), 6 deletions(-)
[arrow] branch master updated (8ea00f0 -> 4d9d66f)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 8ea00f0 ARROW-9470: [CI][Java] Run Maven in parallel add 4d9d66f ARROW-9458: [Python] Release GIL in ScanTask.execute No new revisions were added by this update. Summary of changes: python/pyarrow/_dataset.pyx | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-)
[arrow] branch master updated (bfd2568 -> 8ea00f0)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from bfd2568 ARROW-9390: [Doc] Add missing file add 8ea00f0 ARROW-9470: [CI][Java] Run Maven in parallel No new revisions were added by this update. Summary of changes: ci/scripts/java_build.sh | 2 ++ ci/scripts/java_test.sh | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-)
[arrow] branch master updated (ad2b2c5 -> 4eaca73)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from ad2b2c5 ARROW-8729: [C++][Dataset] Ensure non-empty batches when only virtual columns are projected add 4eaca73 ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged No new revisions were added by this update. Summary of changes: .../arrow/vector/BaseVariableWidthVector.java | 113 .../org/apache/arrow/vector/TestValueVector.java | 145 + 2 files changed, 206 insertions(+), 52 deletions(-)
[arrow] branch master updated (10289a0 -> ad2b2c5)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 10289a0 ARROW-9390: [C++][Doc] Review compute function names add ad2b2c5 ARROW-8729: [C++][Dataset] Ensure non-empty batches when only virtual columns are projected No new revisions were added by this update. Summary of changes: cpp/src/parquet/arrow/arrow_reader_writer_test.cc | 30 ++- cpp/src/parquet/arrow/reader.cc | 257 -- cpp/src/parquet/arrow/reader.h| 15 +- cpp/src/parquet/arrow/reader_internal.cc | 4 +- python/pyarrow/tests/test_dataset.py | 18 ++ 5 files changed, 188 insertions(+), 136 deletions(-)
[arrow] branch master updated (6d7e4ec -> 1d7d919)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6d7e4ec ARROW-9450: [Python] Fix tests startup time add 1d7d919 ARROW-9460: [C++] Fix BinaryContainsExact for pattern with repeated characters No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/kernels/scalar_string.cc | 17 - cpp/src/arrow/compute/kernels/scalar_string_test.cc | 8 2 files changed, 16 insertions(+), 9 deletions(-)
[arrow] branch master updated: ARROW-9440: [Python] Expose Fill Null kernel
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new e559dd0 ARROW-9440: [Python] Expose Fill Null kernel e559dd0 is described below commit e559dd080a27875bab3d5cdb0da115c62e2f60bb Author: c-jamie AuthorDate: Mon Jul 13 19:53:47 2020 -0500 ARROW-9440: [Python] Expose Fill Null kernel Closes #7736 from c-jamie/ARROW-9440 Lead-authored-by: c-jamie Co-authored-by: Wes McKinney Signed-off-by: Wes McKinney --- python/pyarrow/array.pxi | 6 python/pyarrow/compute.py| 41 +++ python/pyarrow/includes/libarrow.pxd | 1 + python/pyarrow/scalar.pxi| 13 python/pyarrow/table.pxi | 6 python/pyarrow/tests/test_compute.py | 63 python/pyarrow/tests/test_scalars.py | 9 ++ 7 files changed, 139 insertions(+) diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi index 1cffd37..1dcff02 100644 --- a/python/pyarrow/array.pxi +++ b/python/pyarrow/array.pxi @@ -1004,6 +1004,12 @@ cdef class Array(_PandasConvertible): """ return _pc().is_valid(self) +def fill_null(self, fill_value): +""" +See pyarrow.compute.fill_null for usage. +""" +return _pc().fill_null(self, fill_value) + def __getitem__(self, key): """ Slice or return value at given index diff --git a/python/pyarrow/compute.py b/python/pyarrow/compute.py index c8443ed..b8e678f 100644 --- a/python/pyarrow/compute.py +++ b/python/pyarrow/compute.py @@ -24,6 +24,7 @@ from pyarrow._compute import ( # noqa call_function, TakeOptions ) +import pyarrow as pa import pyarrow._compute as _pc @@ -259,3 +260,43 @@ def take(data, indices, boundscheck=True): """ options = TakeOptions(boundscheck) return call_function('take', [data, indices], options) + + +def fill_null(values, fill_value): +""" +Replace each null element in values with fill_value. The fill_value must be +the same type as values or able to be implicitly casted to the array's +type. + +Parameters +-- +data : Array, ChunkedArray +replace each null element with fill_value +fill_value: Scalar-like object +Either a pyarrow.Scalar or any python object coercible to a +Scalar. If not same type as data will attempt to cast. + +Returns +--- +result : depends on inputs + +Examples + +>>> import pyarrow as pa +>>> arr = pa.array([1, 2, None, 3], type=pa.int8()) +>>> fill_value = pa.scalar(5, type=pa.int8()) +>>> arr.fill_null(fill_value) +pyarrow.lib.Int8Array object at 0x7f95437f01a0> +[ + 1, + 2, + 5, + 3 +] +""" +if not isinstance(fill_value, pa.Scalar): +fill_value = pa.scalar(fill_value, type=values.type) +elif values.type != fill_value.type: +fill_value = pa.scalar(fill_value.as_py(), type=values.type) + +return call_function("fill_null", [values, fill_value]) diff --git a/python/pyarrow/includes/libarrow.pxd b/python/pyarrow/includes/libarrow.pxd index 213ef24..c8e7c5b 100644 --- a/python/pyarrow/includes/libarrow.pxd +++ b/python/pyarrow/includes/libarrow.pxd @@ -887,6 +887,7 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil: c_bool is_valid c_string ToString() const c_bool Equals(const CScalar& other) const +CResult[shared_ptr[CScalar]] CastTo(shared_ptr[CDataType] to) const cdef cppclass CScalarHash" arrow::Scalar::Hash": size_t operator()(const shared_ptr[CScalar]& scalar) const diff --git a/python/pyarrow/scalar.pxi b/python/pyarrow/scalar.pxi index 903faae..248d926 100644 --- a/python/pyarrow/scalar.pxi +++ b/python/pyarrow/scalar.pxi @@ -63,6 +63,19 @@ cdef class Scalar: """ return self.wrapped.get().is_valid +def cast(self, object target_type): +""" +Attempt a safe cast to target data type. +""" +cdef: +DataType type = ensure_type(target_type) +shared_ptr[CScalar] result + +with nogil: +result = GetResultValue(self.wrapped.get().CastTo(type.sp_type)) + +return Scalar.wrap(result) + def __repr__(self): return ''.format( self.__class__.__name__, self.as_py() diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi index 08e3f75..688d668 100644 --- a/python/pyarrow/table.pxi +++ b/python/pyarrow/table.pxi @@ -191,6 +191,12 @@ cdef class
[arrow] branch master updated (dcd17bf -> cad2e96)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from dcd17bf ARROW-9445: [Python] Revert Array.equals changes + expose comparison ops in compute add cad2e96 ARROW-9442: [Python] Do not call Validate() in pyarrow_wrap_table No new revisions were added by this update. Summary of changes: python/pyarrow/public-api.pxi | 2 -- 1 file changed, 2 deletions(-)
[arrow] branch master updated (cad2e96 -> 427fe07)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from cad2e96 ARROW-9442: [Python] Do not call Validate() in pyarrow_wrap_table add 427fe07 ARROW-9443: [C++] Bundled bz2 build should only build libbz2 No new revisions were added by this update. Summary of changes: .github/workflows/r.yml | 3 +++ cpp/cmake_modules/ThirdpartyToolchain.cmake | 3 ++- dev/tasks/r/azure.linux.yml | 1 + dev/tasks/r/github.linux.cran.yml | 1 + r/configure | 20 +++- r/inst/build_arrow_static.sh| 13 - r/tools/linuxlibs.R | 19 +-- r/vignettes/install.Rmd | 2 +- 8 files changed, 40 insertions(+), 22 deletions(-)
[arrow] branch master updated (389b153 -> dcd17bf)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 389b153 ARROW-9439: [C++] Fix crash on invalid IPC input add dcd17bf ARROW-9445: [Python] Revert Array.equals changes + expose comparison ops in compute No new revisions were added by this update. Summary of changes: python/pyarrow/array.pxi | 31 ++- python/pyarrow/compute.py| 7 +++ python/pyarrow/table.pxi | 10 ++ python/pyarrow/tests/test_array.py | 13 + python/pyarrow/tests/test_compute.py | 33 + python/pyarrow/tests/test_scalars.py | 4 ++-- python/pyarrow/tests/test_table.py | 3 +++ 7 files changed, 54 insertions(+), 47 deletions(-)
[arrow] branch master updated (8daf756 -> 389b153)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 8daf756 ARROW-9446: [C++] Add compiler id, version, and build flags to BuildInfo add 389b153 ARROW-9439: [C++] Fix crash on invalid IPC input No new revisions were added by this update. Summary of changes: cpp/src/arrow/array/array_base.cc | 13 ++ cpp/src/arrow/array/array_base.h | 5 +++ cpp/src/arrow/array/array_test.cc | 49 ++ cpp/src/arrow/array/concatenate.cc | 86 -- cpp/src/arrow/array/data.cc| 6 +++ cpp/src/arrow/array/data.h | 8 +++- cpp/src/arrow/buffer.cc| 41 ++ cpp/src/arrow/buffer.h | 28 + cpp/src/arrow/buffer_test.cc | 37 +++- cpp/src/arrow/ipc/reader.cc| 6 +++ cpp/src/arrow/util/int_util.h | 17 testing| 2 +- 12 files changed, 263 insertions(+), 35 deletions(-)
[arrow] branch master updated: ARROW-9333: [Python] Expose more IPC options
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new feda987 ARROW-9333: [Python] Expose more IPC options feda987 is described below commit feda9877f8145aebf907c61a24640735a968a230 Author: Antoine Pitrou AuthorDate: Mon Jul 13 12:49:07 2020 -0500 ARROW-9333: [Python] Expose more IPC options Also make some optional arguments keyword-only. Closes #7730 from pitrou/ARROW-9333-py-ipc-options Authored-by: Antoine Pitrou Signed-off-by: Wes McKinney --- cpp/src/arrow/ipc/options.h | 7 ++- python/pyarrow/_flight.pyx | 6 +-- python/pyarrow/includes/libarrow.pxd | 2 + python/pyarrow/io.pxi| 29 +-- python/pyarrow/ipc.pxi | 55 ++--- python/pyarrow/ipc.py| 15 +++--- python/pyarrow/tests/test_flight.py | 6 +++ python/pyarrow/tests/test_ipc.py | 95 python/pyarrow/tests/util.py | 16 ++ 9 files changed, 174 insertions(+), 57 deletions(-) diff --git a/cpp/src/arrow/ipc/options.h b/cpp/src/arrow/ipc/options.h index 69e248c..6bbd7b8 100644 --- a/cpp/src/arrow/ipc/options.h +++ b/cpp/src/arrow/ipc/options.h @@ -56,10 +56,9 @@ struct ARROW_EXPORT IpcWriteOptions { /// \brief The memory pool to use for allocations made during IPC writing MemoryPool* memory_pool = default_memory_pool(); - /// \brief EXPERIMENTAL: Codec to use for compressing and decompressing - /// record batch body buffers. This is not part of the Arrow IPC protocol and - /// only for internal use (e.g. Feather files). May only be LZ4_FRAME and - /// ZSTD + /// \brief Compression codec to use for record batch body buffers + /// + /// May only be UNCOMPRESSED, LZ4_FRAME and ZSTD. Compression::type compression = Compression::UNCOMPRESSED; int compression_level = Compression::kUseDefaultCompressionLevel; diff --git a/python/pyarrow/_flight.pyx b/python/pyarrow/_flight.pyx index 7e3c837..7b6b281 100644 --- a/python/pyarrow/_flight.pyx +++ b/python/pyarrow/_flight.pyx @@ -97,10 +97,8 @@ def _munge_grpc_python_error(message): cdef IpcWriteOptions _get_options(options): -cdef IpcWriteOptions write_options = \ - _get_legacy_format_default( -use_legacy_format=None, options=options) -return write_options +return _get_legacy_format_default( +use_legacy_format=None, options=options) cdef class FlightCallOptions: diff --git a/python/pyarrow/includes/libarrow.pxd b/python/pyarrow/includes/libarrow.pxd index 76203f0..3e461c4 100644 --- a/python/pyarrow/includes/libarrow.pxd +++ b/python/pyarrow/includes/libarrow.pxd @@ -1329,6 +1329,8 @@ cdef extern from "arrow/ipc/api.h" namespace "arrow::ipc" nogil: c_bool write_legacy_ipc_format CMemoryPool* memory_pool CMetadataVersion metadata_version +CCompressionType compression +c_bool use_threads @staticmethod CIpcWriteOptions Defaults() diff --git a/python/pyarrow/io.pxi b/python/pyarrow/io.pxi index 76a058d..058b09a 100644 --- a/python/pyarrow/io.pxi +++ b/python/pyarrow/io.pxi @@ -1539,24 +1539,43 @@ def _detect_compression(path): cdef CCompressionType _ensure_compression(str name) except *: uppercase = name.upper() -if uppercase == 'GZIP': -return CCompressionType_GZIP -elif uppercase == 'BZ2': +if uppercase == 'BZ2': return CCompressionType_BZ2 +elif uppercase == 'GZIP': +return CCompressionType_GZIP elif uppercase == 'BROTLI': return CCompressionType_BROTLI elif uppercase == 'LZ4' or uppercase == 'LZ4_FRAME': return CCompressionType_LZ4_FRAME elif uppercase == 'LZ4_RAW': return CCompressionType_LZ4 -elif uppercase == 'ZSTD': -return CCompressionType_ZSTD elif uppercase == 'SNAPPY': return CCompressionType_SNAPPY +elif uppercase == 'ZSTD': +return CCompressionType_ZSTD else: raise ValueError('Invalid value for compression: {!r}'.format(name)) +cdef str _compression_name(CCompressionType ctype): +if ctype == CCompressionType_GZIP: +return 'gzip' +elif ctype == CCompressionType_BROTLI: +return 'brotli' +elif ctype == CCompressionType_BZ2: +return 'bz2' +elif ctype == CCompressionType_LZ4_FRAME: +return 'lz4' +elif ctype == CCompressionType_LZ4: +return 'lz4_raw' +elif ctype == CCompressionType_SNAPPY: +return 'snappy' +elif ctype == CCompressionType_ZSTD: +return 'zstd' +else: +raise RuntimeError('Unexpected CCompressionType value') + + cdef class Codec: """ Compression codec. diff --git a/python/pyarrow/ipc.pxi b/python/pya
[arrow] branch master updated: ARROW-8989: [C++][Doc] Document available compute functions
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 9d2079c ARROW-8989: [C++][Doc] Document available compute functions 9d2079c is described below commit 9d2079c2ead31399b724ecc3775d61432a8096af Author: Antoine Pitrou AuthorDate: Mon Jul 13 12:48:30 2020 -0500 ARROW-8989: [C++][Doc] Document available compute functions Also fix glaring bugs in arithmetic kernels (signed overflow detection was broken). Closes #7695 from pitrou/ARROW-8989-doc-compute-functions Authored-by: Antoine Pitrou Signed-off-by: Wes McKinney --- c_glib/arrow-glib/compute.cpp | 5 +- cpp/src/arrow/array/validate.cc| 7 +- cpp/src/arrow/compute/api.h| 4 + cpp/src/arrow/compute/api_aggregate.h | 61 +-- cpp/src/arrow/compute/api_scalar.h | 97 ++-- cpp/src/arrow/compute/api_vector.h | 37 +- cpp/src/arrow/compute/cast.cc | 2 +- cpp/src/arrow/compute/cast.h | 5 + cpp/src/arrow/compute/exec.h | 14 +- cpp/src/arrow/compute/function.h | 6 + cpp/src/arrow/compute/kernels/aggregate_basic.cc | 2 +- cpp/src/arrow/compute/kernels/aggregate_test.cc| 2 +- cpp/src/arrow/compute/kernels/scalar_arithmetic.cc | 28 +- .../compute/kernels/scalar_arithmetic_test.cc | 47 +- cpp/src/arrow/compute/registry.h | 2 +- cpp/src/arrow/scalar.h | 40 +- cpp/src/arrow/util/int_util.h | 33 +- cpp/src/parquet/column_reader.cc | 7 +- docs/source/conf.py| 7 +- docs/source/cpp/api.rst| 2 + .../cpp/{getting_started.rst => api/compute.rst} | 59 ++- docs/source/cpp/compute.rst| 526 + docs/source/cpp/getting_started.rst| 1 + docs/source/python/api/arrays.rst | 71 +-- docs/source/python/dataset.rst | 4 +- 25 files changed, 883 insertions(+), 186 deletions(-) diff --git a/c_glib/arrow-glib/compute.cpp b/c_glib/arrow-glib/compute.cpp index d8d0bdc..3e31899 100644 --- a/c_glib/arrow-glib/compute.cpp +++ b/c_glib/arrow-glib/compute.cpp @@ -676,7 +676,7 @@ garrow_count_options_set_property(GObject *object, switch (prop_id) { case PROP_MODE: priv->options.count_mode = - static_cast(g_value_get_enum(value)); + static_cast(g_value_get_enum(value)); break; default: G_OBJECT_WARN_INVALID_PROPERTY_ID(object, prop_id, pspec); @@ -706,7 +706,8 @@ static void garrow_count_options_init(GArrowCountOptions *object) { auto priv = GARROW_COUNT_OPTIONS_GET_PRIVATE(object); - new(>options) arrow::compute::CountOptions(arrow::compute::CountOptions::COUNT_ALL); + new(>options) arrow::compute::CountOptions( +arrow::compute::CountOptions::COUNT_NON_NULL); } static void diff --git a/cpp/src/arrow/array/validate.cc b/cpp/src/arrow/array/validate.cc index 3dd0ffd..8fb8b59 100644 --- a/cpp/src/arrow/array/validate.cc +++ b/cpp/src/arrow/array/validate.cc @@ -98,7 +98,7 @@ struct ValidateArrayVisitor { if (value_size < 0) { return Status::Invalid("FixedSizeListArray has negative value size ", value_size); } -if (HasMultiplyOverflow(len, value_size) || +if (HasPositiveMultiplyOverflow(len, value_size) || array.values()->length() != len * value_size) { return Status::Invalid("Values Length (", array.values()->length(), ") is not equal to the length (", len, @@ -329,7 +329,7 @@ Status ValidateArray(const Array& array) { type.ToString(), ", got ", data.buffers.size()); } // This check is required to avoid addition overflow below - if (HasAdditionOverflow(array.length(), array.offset())) { + if (HasPositiveAdditionOverflow(array.length(), array.offset())) { return Status::Invalid("Array of type ", type.ToString(), " has impossibly large length and offset"); } @@ -346,7 +346,8 @@ Status ValidateArray(const Array& array) { min_buffer_size = BitUtil::BytesForBits(array.length() + array.offset()); break; case DataTypeLayout::FIXED_WIDTH: -if (HasMultiplyOverflow(array.length() + array.offset(), spec.byte_width)) { +if (HasPositiveMultiplyOverflow(array.length() + array.offset(), +spec.byte_width)) { return Status::Invalid("Array of type ", type.ToString(), "
[arrow] branch master updated (1f42ac0 -> 875d0539)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 1f42ac0 ARROW-9428: [C++][Doc] Update buffer allocation documentation add 875d0539 ARROW-9436: [C++][CI] Fix Valgrind failure No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/kernels/scalar_fill_null_test.cc | 3 +-- cpp/src/arrow/ipc/message.cc | 2 +- cpp/src/arrow/ipc/metadata_internal.cc | 2 +- cpp/src/arrow/ipc/reader.cc| 2 +- cpp/src/arrow/util/value_parsing_test.cc | 4 ++-- cpp/src/parquet/column_scanner.h | 2 +- docker-compose.yml | 2 +- 7 files changed, 8 insertions(+), 9 deletions(-)
[arrow] branch master updated: ARROW-9428: [C++][Doc] Update buffer allocation documentation
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 1f42ac0 ARROW-9428: [C++][Doc] Update buffer allocation documentation 1f42ac0 is described below commit 1f42ac0ff0bc1ac098cd64ba27c354890c5b8ff4 Author: Antoine Pitrou AuthorDate: Mon Jul 13 12:27:20 2020 -0500 ARROW-9428: [C++][Doc] Update buffer allocation documentation Use Result-returning AllocateBuffer() version in example. Also improve cross-referencing in some places. Closes #7731 from pitrou/ARROW-9428-buffer-allocation-doc Authored-by: Antoine Pitrou Signed-off-by: Wes McKinney --- docs/source/cpp/api/formats.rst | 6 ++ docs/source/cpp/api/support.rst | 11 +++ docs/source/cpp/arrays.rst | 3 +++ docs/source/cpp/conventions.rst | 3 +++ docs/source/cpp/csv.rst | 3 +++ docs/source/cpp/datatypes.rst | 3 +++ docs/source/cpp/io.rst | 4 +++- docs/source/cpp/json.rst| 3 +++ docs/source/cpp/memory.rst | 10 +++--- docs/source/cpp/parquet.rst | 3 +++ docs/source/cpp/tables.rst | 3 +++ 11 files changed, 48 insertions(+), 4 deletions(-) diff --git a/docs/source/cpp/api/formats.rst b/docs/source/cpp/api/formats.rst index 75dfb00..a072f11 100644 --- a/docs/source/cpp/api/formats.rst +++ b/docs/source/cpp/api/formats.rst @@ -19,6 +19,8 @@ File Formats +.. _cpp-api-csv: + CSV === @@ -34,6 +36,8 @@ CSV .. doxygenclass:: arrow::csv::TableReader :members: +.. _cpp-api-json: + Line-separated JSON === @@ -48,6 +52,8 @@ Line-separated JSON .. doxygenclass:: arrow::json::TableReader :members: +.. _cpp-api-parquet: + Parquet reader == diff --git a/docs/source/cpp/api/support.rst b/docs/source/cpp/api/support.rst index 1547a20..c3310e5 100644 --- a/docs/source/cpp/api/support.rst +++ b/docs/source/cpp/api/support.rst @@ -15,9 +15,20 @@ .. specific language governing permissions and limitations .. under the License. +=== Programming Support === +General information +--- + +.. doxygenfunction:: arrow::GetBuildInfo + :project: arrow_cpp + +.. doxygenstruct:: arrow::BuildInfo + :project: arrow_cpp + :members: + Error return and reporting -- diff --git a/docs/source/cpp/arrays.rst b/docs/source/cpp/arrays.rst index 43ac414..bd6ba64 100644 --- a/docs/source/cpp/arrays.rst +++ b/docs/source/cpp/arrays.rst @@ -22,6 +22,9 @@ Arrays == +.. seealso:: + :doc:`Array API reference ` + The central type in Arrow is the class :class:`arrow::Array`. An array represents a known-length sequence of values all having the same type. Internally, those values are represented by one or several buffers, the diff --git a/docs/source/cpp/conventions.rst b/docs/source/cpp/conventions.rst index 33f0a8c..218d028 100644 --- a/docs/source/cpp/conventions.rst +++ b/docs/source/cpp/conventions.rst @@ -102,3 +102,6 @@ For example:: // return success at the end return Status::OK(); } + +.. seealso:: + :doc:`API reference for error reporting ` diff --git a/docs/source/cpp/csv.rst b/docs/source/cpp/csv.rst index 8d37b29..50a5cdb 100644 --- a/docs/source/cpp/csv.rst +++ b/docs/source/cpp/csv.rst @@ -27,6 +27,9 @@ Reading CSV files Arrow provides a fast CSV reader allowing ingestion of external data as Arrow tables. +.. seealso:: + :ref:`CSV reader API reference `. + Basic usage === diff --git a/docs/source/cpp/datatypes.rst b/docs/source/cpp/datatypes.rst index c411632..9149420 100644 --- a/docs/source/cpp/datatypes.rst +++ b/docs/source/cpp/datatypes.rst @@ -21,6 +21,9 @@ Data Types == +.. seealso:: + :doc:`Datatype API reference `. + Data types govern how physical data is interpreted. Their :ref:`specification ` allows binary interoperability between different Arrow implementations, including from different programming languages and runtimes diff --git a/docs/source/cpp/io.rst b/docs/source/cpp/io.rst index ed357c6..501998b 100644 --- a/docs/source/cpp/io.rst +++ b/docs/source/cpp/io.rst @@ -17,6 +17,7 @@ .. default-domain:: cpp .. highlight:: cpp +.. cpp:namespace:: arrow::io == Input / output and filesystems @@ -27,7 +28,8 @@ of input / output operations. They operate on streams of untyped binary data. Those abstractions are used for various purposes such as reading CSV or Parquet data, transmitting IPC streams, and more. -.. cpp:namespace:: arrow::io +.. seealso:: + :doc:`API reference for input/output facilities `. Reading binary data === diff --git a/docs/source/cpp/json.rst b/docs/source/cpp/json.rst index 93dcdfa..cdb742e 100644 --- a/docs/source/cpp/json.rst +++ b/docs/source/cpp/json.rst
[arrow] branch master updated: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new d1db0b0 ARROW-9374: [C++][Python] Expose MakeArrayFromScalar d1db0b0 is described below commit d1db0b08da7fad1fd171c7275264b87a3d9435dc Author: Krisztián Szűcs AuthorDate: Mon Jul 13 12:25:33 2020 -0500 ARROW-9374: [C++][Python] Expose MakeArrayFromScalar Since we have a complete scalar implementation on the python side, we can implement `pa.repeat(value, size=n)` Closes #7684 from kszucs/repeat Authored-by: Krisztián Szűcs Signed-off-by: Wes McKinney --- cpp/src/arrow/array/array_test.cc| 86 +++-- cpp/src/arrow/array/util.cc | 70 cpp/src/arrow/scalar.cc | 2 +- cpp/src/arrow/scalar.h | 6 +- cpp/src/arrow/scalar_test.cc | 12 python/pyarrow/__init__.py | 4 +- python/pyarrow/array.pxi | 120 +++ python/pyarrow/includes/libarrow.pxd | 3 + python/pyarrow/scalar.pxi| 14 ++-- python/pyarrow/tests/test_array.py | 56 python/pyarrow/tests/test_scalars.py | 11 11 files changed, 339 insertions(+), 45 deletions(-) diff --git a/cpp/src/arrow/array/array_test.cc b/cpp/src/arrow/array/array_test.cc index ea1ded6..42e25d0 100644 --- a/cpp/src/arrow/array/array_test.cc +++ b/cpp/src/arrow/array/array_test.cc @@ -354,25 +354,39 @@ TEST_F(TestArray, TestMakeArrayFromScalar) { ASSERT_EQ(null_array->null_count(), 5); auto hello = Buffer::FromString("hello"); - ScalarVector scalars{std::make_shared(false), - std::make_shared(3), - std::make_shared(3), - std::make_shared(3), - std::make_shared(3), - std::make_shared(3.0), - std::make_shared(hello), - std::make_shared(hello), - std::make_shared( - hello, fixed_size_binary(static_cast(hello->size(, - std::make_shared(Decimal128(10), decimal(16, 4)), - std::make_shared(hello), - std::make_shared(hello), - std::make_shared( - ScalarVector{ - std::make_shared(2), - std::make_shared(6), - }, - struct_({field("min", int32()), field("max", int32())}))}; + DayTimeIntervalType::DayMilliseconds daytime{1, 100}; + + ScalarVector scalars{ + std::make_shared(false), + std::make_shared(3), + std::make_shared(3), + std::make_shared(3), + std::make_shared(3), + std::make_shared(3.0), + std::make_shared(10), + std::make_shared(11), + std::make_shared(1000, time32(TimeUnit::SECOND)), + std::make_shared(, time64(TimeUnit::MICRO)), + std::make_shared(, timestamp(TimeUnit::MILLI)), + std::make_shared(1), + std::make_shared(daytime), + std::make_shared(60, duration(TimeUnit::SECOND)), + std::make_shared(hello), + std::make_shared(hello), + std::make_shared( + hello, fixed_size_binary(static_cast(hello->size(, + std::make_shared(Decimal128(10), decimal(16, 4)), + std::make_shared(hello), + std::make_shared(hello), + std::make_shared(ArrayFromJSON(int8(), "[1, 2, 3]")), + std::make_shared(ArrayFromJSON(int8(), "[1, 1, 2, 2, 3, 3]")), + std::make_shared(ArrayFromJSON(int8(), "[1, 2, 3, 4]")), + std::make_shared( + ScalarVector{ + std::make_shared(2), + std::make_shared(6), + }, + struct_({field("min", int32()), field("max", int32())}))}; for (int64_t length : {16}) { for (auto scalar : scalars) { @@ -384,6 +398,40 @@ TEST_F(TestArray, TestMakeArrayFromScalar) { } } +TEST_F(TestArray, TestMakeArrayFromDictionaryScalar) { + auto dictionary = ArrayFromJSON(utf8(), R"(["foo", "bar", "baz"])"); + auto type = std::make_shared(int8(), utf8()); + ASSERT_OK_AND_ASSIGN(auto value, MakeScalar(int8(), 1)); + auto scalar = DictionaryScalar({value, dictionary}, type); + + ASSERT_OK_AND_ASSIGN(auto array, MakeArrayFromScalar(scalar, 4)); + ASSERT_OK(array->ValidateFull()); + ASSERT_EQ(array->length(), 4); + ASSERT_EQ(array->null_count(), 0); + + for (int i = 0; i < 4; i++) { +ASSERT_OK_AND_ASSIGN(auto item, array->GetScalar(i)); +ASSERT_TRUE(item->Equals(scalar)); + } +} + +TEST_F(TestArray, TestMakeArrayFromMapScalar) {
[arrow] branch master updated: ARROW-7208: [Python][Parquet] Raise better error message when passing a directory path instead of a file path to ParquetFile
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 658618e ARROW-7208: [Python][Parquet] Raise better error message when passing a directory path instead of a file path to ParquetFile 658618e is described below commit 658618ecd540bc6af76efa608cd1ff7b7938ba4c Author: Wes McKinney AuthorDate: Sun Jul 12 22:31:18 2020 -0500 ARROW-7208: [Python][Parquet] Raise better error message when passing a directory path instead of a file path to ParquetFile Closes #7722 from wesm/ARROW-7208 Authored-by: Wes McKinney Signed-off-by: Wes McKinney --- python/pyarrow/io.pxi| 9 + python/pyarrow/tests/test_parquet.py | 9 + 2 files changed, 18 insertions(+) diff --git a/python/pyarrow/io.pxi b/python/pyarrow/io.pxi index 8f8cbd1..76a058d 100644 --- a/python/pyarrow/io.pxi +++ b/python/pyarrow/io.pxi @@ -776,11 +776,19 @@ def memory_map(path, mode='r'): --- mmap : MemoryMappedFile """ +_check_is_file(path) + cdef MemoryMappedFile mmap = MemoryMappedFile() mmap._open(path, mode) return mmap +cdef _check_is_file(path): +if os.path.isdir(path): +raise IOError("Expected file path, but {0} is a directory" + .format(path)) + + def create_memory_map(path, size): """ Create a file of the given size and memory-map it. @@ -807,6 +815,7 @@ cdef class OSFile(NativeFile): object path def __cinit__(self, path, mode='r', MemoryPool memory_pool=None): +_check_is_file(path) self.path = path cdef: diff --git a/python/pyarrow/tests/test_parquet.py b/python/pyarrow/tests/test_parquet.py index 539c444..410eee1 100644 --- a/python/pyarrow/tests/test_parquet.py +++ b/python/pyarrow/tests/test_parquet.py @@ -3448,6 +3448,15 @@ def test_empty_row_groups(tempdir): assert reader.read_row_group(i).equals(table) +def test_parquet_file_pass_directory_instead_of_file(tempdir): +# ARROW-7208 +path = tempdir / 'directory' +os.mkdir(str(path)) + +with pytest.raises(IOError, match="Expected file path"): +pq.ParquetFile(path) + + @pytest.mark.pandas @parametrize_legacy_dataset def test_parquet_writer_with_caller_provided_filesystem(use_legacy_dataset):
[arrow] branch master updated: ARROW-9413: [Rust] Disable cpm_nan clippy error
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new b9bbee2 ARROW-9413: [Rust] Disable cpm_nan clippy error b9bbee2 is described below commit b9bbee2511300d39b3f327fa4dd608648d5bde59 Author: Neville Dipale AuthorDate: Sun Jul 12 17:59:48 2020 -0500 ARROW-9413: [Rust] Disable cpm_nan clippy error Using the comparison recommended by clippy makes sorts with `NAN` undeterministic. We currently sort NAN separately to nulls, we couldcan resolve this separately Closes #7710 from nevi-me/ARROW-9413 Authored-by: Neville Dipale Signed-off-by: Wes McKinney --- rust/arrow/src/compute/kernels/sort.rs | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rust/arrow/src/compute/kernels/sort.rs b/rust/arrow/src/compute/kernels/sort.rs index 8cd6f7b..2b4cbbc 100644 --- a/rust/arrow/src/compute/kernels/sort.rs +++ b/rust/arrow/src/compute/kernels/sort.rs @@ -52,12 +52,14 @@ pub fn sort_to_indices( .as_any() .downcast_ref::() .expect("Unable to downcast array"); +#[allow(clippy::cmp_nan)] range.partition(|index| array.is_valid(*index) && array.value(*index) != f32::NAN) } else if values.data_type() == ::Float64 { let array = values .as_any() .downcast_ref::() .expect("Unable to downcast array"); +#[allow(clippy::cmp_nan)] range.partition(|index| array.is_valid(*index) && array.value(*index) != f64::NAN) } else { range.partition(|index| values.is_valid(*index))
[arrow] branch master updated: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 44aa829 ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning 44aa829 is described below commit 44aa8292605bf7484ae73b289055482e399e90d0 Author: Joris Van den Bossche AuthorDate: Sun Jul 12 17:58:10 2020 -0500 ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning Closes #7608 from jorisvandenbossche/ARROW-9288 Authored-by: Joris Van den Bossche Signed-off-by: Wes McKinney --- cpp/src/arrow/dataset/partition.cc | 26 +- python/pyarrow/tests/test_dataset.py | 29 + 2 files changed, 54 insertions(+), 1 deletion(-) diff --git a/cpp/src/arrow/dataset/partition.cc b/cpp/src/arrow/dataset/partition.cc index 744e9dd..2a2ecdf 100644 --- a/cpp/src/arrow/dataset/partition.cc +++ b/cpp/src/arrow/dataset/partition.cc @@ -317,6 +317,16 @@ class KeyValuePartitioningInspectImpl { return ::arrow::schema(std::move(fields)); } + std::vector FieldNames() { +std::vector names; +names.reserve(name_to_index_.size()); + +for (auto kv : name_to_index_) { + names.push_back(kv.first); +} +return names; + } + private: std::unordered_map name_to_index_; std::vector> values_; @@ -657,15 +667,29 @@ class HivePartitioningFactory : public PartitioningFactory { } } +field_names_ = impl.FieldNames(); return impl.Finish(_); } Result> Finish( const std::shared_ptr& schema) const override { -return std::shared_ptr(new HivePartitioning(schema, dictionaries_)); +if (dictionaries_.empty()) { + return std::make_shared(schema, dictionaries_); +} else { + for (FieldRef ref : field_names_) { +// ensure all of field_names_ are present in schema +RETURN_NOT_OK(ref.FindOne(*schema).status()); + } + + // drop fields which aren't in field_names_ + auto out_schema = SchemaFromColumnNames(schema, field_names_); + + return std::make_shared(std::move(out_schema), dictionaries_); +} } private: + std::vector field_names_; ArrayVector dictionaries_; PartitioningFactoryOptions options_; }; diff --git a/python/pyarrow/tests/test_dataset.py b/python/pyarrow/tests/test_dataset.py index 1c348f4..428547c 100644 --- a/python/pyarrow/tests/test_dataset.py +++ b/python/pyarrow/tests/test_dataset.py @@ -1484,6 +1484,35 @@ def test_open_dataset_non_existing_file(): ds.dataset('file:i-am-not-existing.parquet', format='parquet') +@pytest.mark.parquet +@pytest.mark.parametrize('partitioning', ["directory", "hive"]) +def test_open_dataset_partitioned_dictionary_type(tempdir, partitioning): +# ARROW-9288 +import pyarrow.parquet as pq +table = pa.table({'a': range(9), 'b': [0.] * 4 + [1.] * 5}) + +path = tempdir / "dataset" +path.mkdir() + +for part in ["A", "B", "C"]: +fmt = "{}" if partitioning == "directory" else "part={}" +part = path / fmt.format(part) +part.mkdir() +pq.write_table(table, part / "test.parquet") + +if partitioning == "directory": +part = ds.DirectoryPartitioning.discover( +["part"], max_partition_dictionary_size=-1) +else: +part = ds.HivePartitioning.discover(max_partition_dictionary_size=-1) + +dataset = ds.dataset(str(path), partitioning=part) +expected_schema = table.schema.append( +pa.field("part", pa.dictionary(pa.int32(), pa.string())) +) +assert dataset.schema.equals(expected_schema) + + @pytest.fixture def s3_example_simple(s3_connection, s3_server): from pyarrow.fs import FileSystem
[arrow] branch master updated: ARROW-9321: [C++][Dataset] Populate statistics opportunistically
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 3ae46e3 ARROW-9321: [C++][Dataset] Populate statistics opportunistically 3ae46e3 is described below commit 3ae46e33aa94c8f357abb8c6debe361b53d7907d Author: Benjamin Kietzman AuthorDate: Sun Jul 12 17:53:16 2020 -0500 ARROW-9321: [C++][Dataset] Populate statistics opportunistically Populate ParquetFileFragment statistics whenever a reader is opened anyway. Also provides an explicit method for forcing load of statistics. (I exposed this as a public method, but maybe we'd prefer to hide it inside the `statistics` property the way we do physical schema?) Closes #7692 from bkietz/9321-populate-statistics-on-read Lead-authored-by: Benjamin Kietzman Co-authored-by: Joris Van den Bossche Signed-off-by: Wes McKinney --- cpp/src/arrow/dataset/dataset.cc | 12 +- cpp/src/arrow/dataset/file_parquet.cc| 230 ++- cpp/src/arrow/dataset/file_parquet.h | 24 +-- python/pyarrow/_dataset.pyx | 13 +- python/pyarrow/includes/libarrow_dataset.pxd | 1 + python/pyarrow/tests/test_dataset.py | 54 ++- 6 files changed, 207 insertions(+), 127 deletions(-) diff --git a/cpp/src/arrow/dataset/dataset.cc b/cpp/src/arrow/dataset/dataset.cc index ed936db..71755aa 100644 --- a/cpp/src/arrow/dataset/dataset.cc +++ b/cpp/src/arrow/dataset/dataset.cc @@ -40,9 +40,17 @@ Fragment::Fragment(std::shared_ptr partition_expression, } Result> Fragment::ReadPhysicalSchema() { + { +auto lock = physical_schema_mutex_.Lock(); +if (physical_schema_ != nullptr) return physical_schema_; + } + + // allow ReadPhysicalSchemaImpl to lock mutex_, if necessary + ARROW_ASSIGN_OR_RAISE(auto physical_schema, ReadPhysicalSchemaImpl()); + auto lock = physical_schema_mutex_.Lock(); - if (physical_schema_ == NULLPTR) { -ARROW_ASSIGN_OR_RAISE(physical_schema_, ReadPhysicalSchemaImpl()); + if (physical_schema_ == nullptr) { +physical_schema_ = std::move(physical_schema); } return physical_schema_; } diff --git a/cpp/src/arrow/dataset/file_parquet.cc b/cpp/src/arrow/dataset/file_parquet.cc index d5e05ed..4581faa 100644 --- a/cpp/src/arrow/dataset/file_parquet.cc +++ b/cpp/src/arrow/dataset/file_parquet.cc @@ -286,10 +286,9 @@ ParquetFileFormat::ParquetFileFormat(const parquet::ReaderProperties& reader_pro Result ParquetFileFormat::IsSupported(const FileSource& source) const { try { ARROW_ASSIGN_OR_RAISE(auto input, source.Open()); -auto properties = MakeReaderProperties(*this); auto reader = -parquet::ParquetFileReader::Open(std::move(input), std::move(properties)); -auto metadata = reader->metadata(); +parquet::ParquetFileReader::Open(std::move(input), MakeReaderProperties(*this)); +std::shared_ptr metadata = reader->metadata(); return metadata != nullptr && metadata->can_decompress(); } catch (const ::parquet::ParquetInvalidOrCorruptedFileException& e) { ARROW_UNUSED(e); @@ -316,7 +315,7 @@ Result> ParquetFileFormat::GetReader auto properties = MakeReaderProperties(*this, pool); ARROW_ASSIGN_OR_RAISE(auto reader, OpenReader(source, std::move(properties))); - auto metadata = reader->metadata(); + std::shared_ptr metadata = reader->metadata(); auto arrow_properties = MakeArrowReaderProperties(*this, *metadata); if (options) { @@ -335,91 +334,41 @@ static inline bool RowGroupInfosAreComplete(const std::vector& inf [](const RowGroupInfo& i) { return i.HasStatistics(); }); } -static inline std::vector FilterRowGroups( -std::vector row_groups, const Expression& predicate) { - auto filter = [](const RowGroupInfo& info) { -return !info.Satisfy(predicate); - }; - auto end = std::remove_if(row_groups.begin(), row_groups.end(), filter); - row_groups.erase(end, row_groups.end()); - return row_groups; -} - -static inline Result> AugmentRowGroups( -std::vector row_groups, parquet::arrow::FileReader* reader) { - auto metadata = reader->parquet_reader()->metadata(); - auto manifest = reader->manifest(); - auto num_row_groups = metadata->num_row_groups(); - - if (row_groups.empty()) { -row_groups = RowGroupInfo::FromCount(num_row_groups); - } - - // Augment a RowGroup with statistics if missing. - auto augment = [&](RowGroupInfo& info) { -if (!info.HasStatistics() && info.id() < num_row_groups) { - auto row_group = metadata->RowGroup(info.id()); - info.set_num_rows(row_group->num_rows()); - info.set_total_byte_size(row_group->total_byte_size()); - info.set_statistics(RowGroupStatisticsAsStructScalar(*row_group, m
[arrow] branch master updated (2e94641 -> 5dbf30a)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 2e94641 ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader add 5dbf30a ARROW-9418 [R] nyc-taxi Parquet files not downloaded in binary mode on Windows No new revisions were added by this update. Summary of changes: r/vignettes/dataset.Rmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
[arrow] branch master updated (9ef539e -> 2e94641)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 9ef539e ARROW-4221: [C++][Python] Add canonical flag in COO sparse index add 2e94641 ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader No new revisions were added by this update. Summary of changes: cpp/src/arrow/util/iterator.h | 16 +-- cpp/src/arrow/util/iterator_test.cc | 8 +- cpp/src/parquet/arrow/arrow_reader_writer_test.cc | 16 ++- cpp/src/parquet/arrow/reader.cc | 116 +++--- cpp/src/parquet/arrow/reader.h| 27 +++-- cpp/src/parquet/arrow/schema.h| 56 +++ 6 files changed, 147 insertions(+), 92 deletions(-)
[arrow] branch master updated (9ef539e -> 2e94641)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 9ef539e ARROW-4221: [C++][Python] Add canonical flag in COO sparse index add 2e94641 ARROW-9297: [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader No new revisions were added by this update. Summary of changes: cpp/src/arrow/util/iterator.h | 16 +-- cpp/src/arrow/util/iterator_test.cc | 8 +- cpp/src/parquet/arrow/arrow_reader_writer_test.cc | 16 ++- cpp/src/parquet/arrow/reader.cc | 116 +++--- cpp/src/parquet/arrow/reader.h| 27 +++-- cpp/src/parquet/arrow/schema.h| 56 +++ 6 files changed, 147 insertions(+), 92 deletions(-)
[arrow] branch master updated (d019bc3 -> 9ef539e)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from d019bc3 PARQUET-1882: [C++] Buffered Reads should allow for 0 length add 9ef539e ARROW-4221: [C++][Python] Add canonical flag in COO sparse index No new revisions were added by this update. Summary of changes: cpp/src/arrow/ipc/metadata_internal.cc | 3 +- cpp/src/arrow/ipc/read_write_test.cc | 25 cpp/src/arrow/ipc/reader.cc| 5 +- cpp/src/arrow/python/numpy_convert.cc | 4 +- cpp/src/arrow/sparse_tensor.cc | 126 - cpp/src/arrow/sparse_tensor.h | 34 - cpp/src/arrow/sparse_tensor_test.cc| 213 + cpp/src/arrow/tensor/coo_converter.cc | 10 +- cpp/src/generated/SparseTensor_generated.h | 21 ++- format/SparseTensor.fbs| 11 +- python/pyarrow/includes/libarrow.pxd | 8 ++ python/pyarrow/tensor.pxi | 36 - python/pyarrow/tests/test_sparse_tensor.py | 33 +++-- 13 files changed, 470 insertions(+), 59 deletions(-)
[arrow] branch master updated (7d377ba -> d019bc3)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 7d377ba ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate add d019bc3 PARQUET-1882: [C++] Buffered Reads should allow for 0 length No new revisions were added by this update. Summary of changes: cpp/src/arrow/io/buffered.cc | 4 +++- cpp/src/arrow/io/buffered_test.cc | 9 cpp/src/parquet/file_serialize_test.cc | 42 ++ 3 files changed, 54 insertions(+), 1 deletion(-)
[arrow] branch master updated (3b0055a -> df629f9)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 3b0055a ARROW-9417: [C++] Write length in IPC message by using little-endian add df629f9 ARROW-9419: [C++] Expand fill_null function testing, test sliced arrays, fix some bugs No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/kernels/scalar_fill_null.cc | 21 .../arrow/compute/kernels/scalar_fill_null_test.cc | 62 +++--- cpp/src/arrow/testing/gtest_util.cc| 4 ++ 3 files changed, 72 insertions(+), 15 deletions(-)
[arrow] branch master updated: ARROW-9417: [C++] Write length in IPC message by using little-endian
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 3b0055a ARROW-9417: [C++] Write length in IPC message by using little-endian 3b0055a is described below commit 3b0055adc4ab54b59d0671821c3767cebf291bd5 Author: Kazuaki Ishizaki AuthorDate: Sun Jul 12 12:09:18 2020 -0500 ARROW-9417: [C++] Write length in IPC message by using little-endian This PR forces to write metadata_length and footer_length in IPC messages by using little-endian to follow [the specification](https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst). Closes #7716 from kiszk/ARROW-9417 Authored-by: Kazuaki Ishizaki Signed-off-by: Wes McKinney --- cpp/src/arrow/ipc/message.cc | 18 ++ cpp/src/arrow/ipc/read_write_test.cc | 5 + cpp/src/arrow/ipc/reader.cc | 3 ++- cpp/src/arrow/ipc/writer.cc | 2 ++ 4 files changed, 19 insertions(+), 9 deletions(-) diff --git a/cpp/src/arrow/ipc/message.cc b/cpp/src/arrow/ipc/message.cc index aeb106e..dcf61ef 100644 --- a/cpp/src/arrow/ipc/message.cc +++ b/cpp/src/arrow/ipc/message.cc @@ -424,8 +424,9 @@ Status WriteMessage(const Buffer& message, const IpcWriteOptions& options, RETURN_NOT_OK(file->Write(::kIpcContinuationToken, sizeof(int32_t))); } - // Write the flatbuffer size prefix including padding - int32_t padded_flatbuffer_size = padded_message_length - prefix_size; + // Write the flatbuffer size prefix including padding in little endian + int32_t padded_flatbuffer_size = + BitUtil::ToLittleEndian(padded_message_length - prefix_size); RETURN_NOT_OK(file->Write(_flatbuffer_size, sizeof(int32_t))); // Write the flatbuffer @@ -577,18 +578,18 @@ class MessageDecoder::MessageDecoderImpl { } Status ConsumeInitialData(const uint8_t* data, int64_t size) { -return ConsumeInitial(util::SafeLoadAs(data)); +return ConsumeInitial(BitUtil::FromLittleEndian(util::SafeLoadAs(data))); } Status ConsumeInitialBuffer(const std::shared_ptr& buffer) { ARROW_ASSIGN_OR_RAISE(auto continuation, ConsumeDataBufferInt32(buffer)); -return ConsumeInitial(continuation); +return ConsumeInitial(BitUtil::FromLittleEndian(continuation)); } Status ConsumeInitialChunks() { int32_t continuation = 0; RETURN_NOT_OK(ConsumeDataChunks(sizeof(int32_t), )); -return ConsumeInitial(continuation); +return ConsumeInitial(BitUtil::FromLittleEndian(continuation)); } Status ConsumeInitial(int32_t continuation) { @@ -616,18 +617,19 @@ class MessageDecoder::MessageDecoderImpl { } Status ConsumeMetadataLengthData(const uint8_t* data, int64_t size) { -return ConsumeMetadataLength(util::SafeLoadAs(data)); +return ConsumeMetadataLength( +BitUtil::FromLittleEndian(util::SafeLoadAs(data))); } Status ConsumeMetadataLengthBuffer(const std::shared_ptr& buffer) { ARROW_ASSIGN_OR_RAISE(auto metadata_length, ConsumeDataBufferInt32(buffer)); -return ConsumeMetadataLength(metadata_length); +return ConsumeMetadataLength(BitUtil::FromLittleEndian(metadata_length)); } Status ConsumeMetadataLengthChunks() { int32_t metadata_length = 0; RETURN_NOT_OK(ConsumeDataChunks(sizeof(int32_t), _length)); -return ConsumeMetadataLength(metadata_length); +return ConsumeMetadataLength(BitUtil::FromLittleEndian(metadata_length)); } Status ConsumeMetadataLength(int32_t metadata_length) { diff --git a/cpp/src/arrow/ipc/read_write_test.cc b/cpp/src/arrow/ipc/read_write_test.cc index 9e4f4c9..6ae7611 100644 --- a/cpp/src/arrow/ipc/read_write_test.cc +++ b/cpp/src/arrow/ipc/read_write_test.cc @@ -131,6 +131,11 @@ TEST_P(TestMessage, SerializeTo) { ASSERT_EQ(BitUtil::RoundUp(metadata->size() + prefix_size, alignment) + body_length, output_length); ASSERT_OK_AND_EQ(output_length, stream->Tell()); +ASSERT_OK_AND_ASSIGN(auto buffer, stream->Finish()); +// chech whether length is written in little endian +auto buffer_ptr = buffer.get()->data(); +ASSERT_EQ(output_length - body_length - prefix_size, + BitUtil::FromLittleEndian(*(uint32_t*)(buffer_ptr + 4))); }; CheckWithAlignment(8); diff --git a/cpp/src/arrow/ipc/reader.cc b/cpp/src/arrow/ipc/reader.cc index 3c51fef..75f2213 100644 --- a/cpp/src/arrow/ipc/reader.cc +++ b/cpp/src/arrow/ipc/reader.cc @@ -979,7 +979,8 @@ class RecordBatchFileReaderImpl : public RecordBatchFileReader { return Status::Invalid("Not an Arrow file"); } -int32_t footer_length = *reinterpret_cast(buffer->data()); +int32_t footer_length = +BitUtil::FromLittleEndian(*reinterpret_cast(buffer->data())); if (footer_length <= 0 ||
[arrow] branch master updated (a5914d5 -> 35c8dff)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a5914d5 ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels add 35c8dff PARQUET-1839: Set values read for required column No new revisions were added by this update. Summary of changes: cpp/src/parquet/column_reader.cc | 1 + 1 file changed, 1 insertion(+)
[arrow] branch master updated (3e940dc -> a5914d5)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 3e940dc ARROW-9389: [C++] Add binary metafunctions for the set lookup kernels isin and match that can be called with CallFunction add a5914d5 ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/kernels/scalar_string.cc | 491 - .../compute/kernels/scalar_string_benchmark.cc | 10 + .../arrow/compute/kernels/scalar_string_test.cc| 164 +++ cpp/src/arrow/util/utf8.h | 19 + docker-compose.yml | 2 + python/pyarrow/compute.py | 21 + python/pyarrow/tests/test_compute.py | 122 + 7 files changed, 826 insertions(+), 3 deletions(-)
[arrow] branch master updated (1a7519f -> 3e940dc)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 1a7519f ARROW-9395: [Python] allow configuring MetadataVersion add 3e940dc ARROW-9389: [C++] Add binary metafunctions for the set lookup kernels isin and match that can be called with CallFunction No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/kernels/scalar_set_lookup.cc | 30 ++ .../compute/kernels/scalar_set_lookup_test.cc | 16 +--- 2 files changed, 43 insertions(+), 3 deletions(-)
[arrow] branch master updated (18a5e3e -> 1a7519f)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 18a5e3e ARROW-9331: [C++] Improve the performance of Tensor-to-SparseTensor conversion add 1a7519f ARROW-9395: [Python] allow configuring MetadataVersion No new revisions were added by this update. Summary of changes: cpp/src/arrow/python/flight.cc | 4 +- cpp/src/arrow/python/flight.h | 3 +- python/pyarrow/_flight.pyx | 46 +-- python/pyarrow/includes/libarrow.pxd| 1 + python/pyarrow/includes/libarrow_flight.pxd | 10 +++-- python/pyarrow/ipc.pxi | 46 +-- python/pyarrow/ipc.py | 56 --- python/pyarrow/lib.pxd | 6 +++ python/pyarrow/tests/test_flight.py | 62 -- python/pyarrow/tests/test_ipc.py| 69 - 10 files changed, 259 insertions(+), 44 deletions(-)
[arrow] branch master updated (d2ddaa6 -> 18a5e3e)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from d2ddaa6 ARROW-1692: [Java] UnionArray round trip not working add 18a5e3e ARROW-9331: [C++] Improve the performance of Tensor-to-SparseTensor conversion No new revisions were added by this update. Summary of changes: cpp/src/arrow/tensor/converter_internal.h | 88 +++ cpp/src/arrow/tensor/coo_converter.cc | 140 +- cpp/src/arrow/tensor/csx_converter.cc | 2 +- cpp/src/arrow/util/macros.h | 1 + 4 files changed, 208 insertions(+), 23 deletions(-) create mode 100644 cpp/src/arrow/tensor/converter_internal.h
[arrow] branch master updated (32e1ab3 -> d2ddaa6)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 32e1ab3 ARROW-9276: [Dev] Enable ARROW_CUDA when generating API documentations add d2ddaa6 ARROW-1692: [Java] UnionArray round trip not working No new revisions were added by this update. Summary of changes: dev/archery/archery/integration/datagen.py | 1 - dev/archery/archery/integration/runner.py | 2 + .../main/codegen/templates/DenseUnionVector.java | 154 +++-- .../src/main/codegen/templates/UnionVector.java| 91 .../java/org/apache/arrow/vector/BufferLayout.java | 2 +- .../java/org/apache/arrow/vector/NullVector.java | 5 +- .../java/org/apache/arrow/vector/TypeLayout.java | 4 +- .../apache/arrow/vector/ipc/JsonFileReader.java| 17 ++- .../apache/arrow/vector/ipc/JsonFileWriter.java| 11 +- .../java/org/apache/arrow/vector/types/Types.java | 9 +- .../org/apache/arrow/vector/util/Validator.java| 2 + .../apache/arrow/vector/util/VectorAppender.java | 13 +- .../apache/arrow/vector/TestDenseUnionVector.java | 23 +-- .../org/apache/arrow/vector/TestTypeLayout.java| 2 +- .../org/apache/arrow/vector/TestUnionVector.java | 13 +- .../org/apache/arrow/vector/TestValueVector.java | 24 ++-- .../vector/complex/impl/TestPromotableWriter.java | 4 +- 17 files changed, 182 insertions(+), 195 deletions(-)
[arrow] branch master updated (6ada172 -> 32e1ab3)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6ada172 ARROW-9283: [Python] Expose build info add 32e1ab3 ARROW-9276: [Dev] Enable ARROW_CUDA when generating API documentations No new revisions were added by this update. Summary of changes: ci/docker/linux-apt-docs.dockerfile | 1 + dev/release/post-09-docs.sh | 31 ++- docker-compose.yml | 29 ++--- 3 files changed, 21 insertions(+), 40 deletions(-)
[arrow] branch master updated (2fac048 -> 6ada172)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 2fac048 ARROW-9403: [Python] add Array.tolist as alias of .to_pylist add 6ada172 ARROW-9283: [Python] Expose build info No new revisions were added by this update. Summary of changes: cpp/src/arrow/util/config.h.cmake| 2 +- python/pyarrow/__init__.py | 21 +++- python/pyarrow/config.pxi| 49 python/pyarrow/includes/libarrow.pxd | 14 +++ python/pyarrow/lib.pyx | 3 +++ python/pyarrow/tests/test_misc.py| 10 python/setup.py | 12 - 7 files changed, 108 insertions(+), 3 deletions(-) create mode 100644 python/pyarrow/config.pxi
[arrow] branch master updated (16290e7 -> 2fac048)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 16290e7 ARROW-1567: [C++] Implement "fill_null" function that replaces null values with a scalar value add 2fac048 ARROW-9403: [Python] add Array.tolist as alias of .to_pylist No new revisions were added by this update. Summary of changes: python/pyarrow/array.pxi | 6 ++ 1 file changed, 6 insertions(+)
[arrow] branch master updated (b02095f -> 16290e7)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from b02095f ARROW-9415: [C++] Arrow does not compile on Power9 add 16290e7 ARROW-1567: [C++] Implement "fill_null" function that replaces null values with a scalar value No new revisions were added by this update. Summary of changes: cpp/src/arrow/CMakeLists.txt | 1 + cpp/src/arrow/compute/api_scalar.cc| 4 + cpp/src/arrow/compute/api_scalar.h | 15 ++ cpp/src/arrow/compute/kernels/CMakeLists.txt | 1 + cpp/src/arrow/compute/kernels/codegen_internal.h | 40 + cpp/src/arrow/compute/kernels/scalar_fill_null.cc | 168 + .../arrow/compute/kernels/scalar_fill_null_test.cc | 109 + cpp/src/arrow/compute/registry.cc | 1 + cpp/src/arrow/compute/registry_internal.h | 1 + 9 files changed, 340 insertions(+) create mode 100644 cpp/src/arrow/compute/kernels/scalar_fill_null.cc create mode 100644 cpp/src/arrow/compute/kernels/scalar_fill_null_test.cc
[arrow] branch master updated (5e122c6 -> b02095f)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 5e122c6 ARROW-9407: [Python] Recognize more pandas null sentinels in sequence type inference when converting to Arrow add b02095f ARROW-9415: [C++] Arrow does not compile on Power9 No new revisions were added by this update. Summary of changes: cpp/src/arrow/util/hashing.h | 7 +++ 1 file changed, 7 insertions(+)
[arrow] branch master updated (fe541e8 -> 5e122c6)
This is an automated email from the ASF dual-hosted git repository. wesm pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from fe541e8 ARROW-9362: [Java] Support reading/writing V5 MetadataVersion add 5e122c6 ARROW-9407: [Python] Recognize more pandas null sentinels in sequence type inference when converting to Arrow No new revisions were added by this update. Summary of changes: cpp/src/arrow/python/inference.cc | 8 +++- python/pyarrow/tests/test_pandas.py | 10 +++--- 2 files changed, 14 insertions(+), 4 deletions(-)