[arrow] branch master updated: ARROW-4200: [C++/Python] Enable conda_env_python.yml to work on Windows, simplify python/development.rst
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 090a8c0 ARROW-4200: [C++/Python] Enable conda_env_python.yml to work on Windows, simplify python/development.rst 090a8c0 is described below commit 090a8c020611b2f75ec0e36d765cc6d48adbe9a7 Author: Wes McKinney AuthorDate: Tue Jan 8 22:59:00 2019 -0600 ARROW-4200: [C++/Python] Enable conda_env_python.yml to work on Windows, simplify python/development.rst I also removed nomkl from conda_env_python.yml. It's sort of a developer decision whether or not they want to install the MKL -- we shouldn't force them to _not_ have it Author: Wes McKinney Closes #3353 from wesm/ARROW-4200 and squashes the following commits: 4849a326d Accept bkietz suggestions 576e63b27 Also add nomkl to python/Dockerfile 9b39e8300 Get conda env files working on Windows, small cleaning to Python development instructions --- ci/conda_env_python.yml| 2 -- ci/conda_env_unix.yml | 1 + ci/travis_script_python.sh | 1 + docs/source/python/development.rst | 23 +++ python/Dockerfile | 1 + 5 files changed, 10 insertions(+), 18 deletions(-) diff --git a/ci/conda_env_python.yml b/ci/conda_env_python.yml index d3756cb..b51f5c3 100644 --- a/ci/conda_env_python.yml +++ b/ci/conda_env_python.yml @@ -18,10 +18,8 @@ cython cloudpickle hypothesis -nomkl numpy pandas pytest -rsync setuptools setuptools_scm diff --git a/ci/conda_env_unix.yml b/ci/conda_env_unix.yml index eeb90e4..9ecf549 100644 --- a/ci/conda_env_unix.yml +++ b/ci/conda_env_unix.yml @@ -18,3 +18,4 @@ # conda package dependencies specific to Unix-like environments (Linux and macOS) autoconf +rsync diff --git a/ci/travis_script_python.sh b/ci/travis_script_python.sh index 69e115a..e9a1122 100755 --- a/ci/travis_script_python.sh +++ b/ci/travis_script_python.sh @@ -47,6 +47,7 @@ fi conda create -y -q -p $CONDA_ENV_DIR \ --file $TRAVIS_BUILD_DIR/ci/conda_env_python.yml \ + nomkl \ cmake \ pip \ numpy=1.13.1 \ diff --git a/docs/source/python/development.rst b/docs/source/python/development.rst index 0bc1c62..d855371 100644 --- a/docs/source/python/development.rst +++ b/docs/source/python/development.rst @@ -86,18 +86,9 @@ On Linux and OSX: --file arrow/ci/conda_env_python.yml \ python=3.6 - source activate pyarrow-dev + conda activate pyarrow-dev -On Windows: - -.. code-block:: shell - -conda create -y -n pyarrow-dev -c conda-forge ^ ---file arrow\ci\conda_env_cpp.yml ^ ---file arrow\ci\conda_env_python.yml ^ -python=3.6 - - activate pyarrow-dev +For Windows, see the `Developing on Windows`_ section below. We need to set some environment variables to let Arrow's build system know about our build toolchain: @@ -310,11 +301,11 @@ First, starting from fresh clones of Apache Arrow: .. code-block:: shell - conda create -y -q -n pyarrow-dev ^ - python=3.6 numpy six setuptools cython pandas pytest ^ - cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib ^ - gflags brotli lz4-c zstd -c conda-forge - activate pyarrow-dev +conda create -y -n pyarrow-dev -c conda-forge ^ +--file arrow\ci\conda_env_cpp.yml ^ +--file arrow\ci\conda_env_python.yml ^ +python=3.7 + conda activate pyarrow-dev Now, we build and install Arrow C++ libraries diff --git a/python/Dockerfile b/python/Dockerfile index a99a420..ecabc94 100644 --- a/python/Dockerfile +++ b/python/Dockerfile @@ -21,6 +21,7 @@ FROM arrow:cpp ARG PYTHON_VERSION=3.6 ADD ci/conda_env_python.yml /arrow/ci/ RUN conda install -c conda-forge \ +nomkl \ --file arrow/ci/conda_env_python.yml \ python=$PYTHON_VERSION && \ conda clean --all
[arrow] branch master updated: ARROW-4199: [GLib] Add garrow_seekable_input_stream_peek()
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new cec7541 ARROW-4199: [GLib] Add garrow_seekable_input_stream_peek() cec7541 is described below commit cec75410b78b70b30bd57908d920c006d9101b72 Author: Yosuke Shiro AuthorDate: Wed Jan 9 13:35:05 2019 +0900 ARROW-4199: [GLib] Add garrow_seekable_input_stream_peek() Author: Yosuke Shiro Author: Kouhei Sutou Closes #3351 from shiro615/glib-support-peek and squashes the following commits: 1f445764 Improve document a5f0fdfd Add GARROW_AVAILABLE_IN_0_12 b27c0a04 Use g_bytes_new_static to avoid copying the data f9d9f237 Add support for Peek to InputStream --- c_glib/arrow-glib/input-stream.cpp | 24 c_glib/arrow-glib/input-stream.h| 3 +++ c_glib/test/test-buffer-input-stream.rb | 8 3 files changed, 35 insertions(+) diff --git a/c_glib/arrow-glib/input-stream.cpp b/c_glib/arrow-glib/input-stream.cpp index cb36e49..cb1fb3b 100644 --- a/c_glib/arrow-glib/input-stream.cpp +++ b/c_glib/arrow-glib/input-stream.cpp @@ -325,6 +325,30 @@ garrow_seekable_input_stream_read_at(GArrowSeekableInputStream *input_stream, } +/** + * garrow_seekable_input_stream_peek: + * @input_stream: A #GArrowSeekableInputStream. + * @n_bytes: The number of bytes to be peeked. + * + * Returns: (transfer full): The data of the buffer, up to the + * indicated number. The data becomes invalid after any operation on + * the stream. If the stream is unbuffered, the data is empty. + * + * It should be freed with g_bytes_unref() when no longer needed. + * + * Since: 0.12.0 + */ +GBytes * +garrow_seekable_input_stream_peek(GArrowSeekableInputStream *input_stream, + gint64 n_bytes) +{ + auto arrow_random_access_file = +garrow_seekable_input_stream_get_raw(input_stream); + auto string_view = arrow_random_access_file->Peek(n_bytes); + return g_bytes_new_static(string_view.data(), string_view.size()); +} + + typedef struct GArrowBufferInputStreamPrivate_ { GArrowBuffer *buffer; } GArrowBufferInputStreamPrivate; diff --git a/c_glib/arrow-glib/input-stream.h b/c_glib/arrow-glib/input-stream.h index 9deebd7..745b912 100644 --- a/c_glib/arrow-glib/input-stream.h +++ b/c_glib/arrow-glib/input-stream.h @@ -66,6 +66,9 @@ GArrowBuffer *garrow_seekable_input_stream_read_at(GArrowSeekableInputStream *in gint64 position, gint64 n_bytes, GError **error); +GARROW_AVAILABLE_IN_0_12 +GBytes *garrow_seekable_input_stream_peek(GArrowSeekableInputStream *input_stream, + gint64 n_bytes); #define GARROW_TYPE_BUFFER_INPUT_STREAM \ diff --git a/c_glib/test/test-buffer-input-stream.rb b/c_glib/test/test-buffer-input-stream.rb index f5a0132..cb6a667 100644 --- a/c_glib/test/test-buffer-input-stream.rb +++ b/c_glib/test/test-buffer-input-stream.rb @@ -39,4 +39,12 @@ class TestBufferInputStream < Test::Unit::TestCase read_buffer = buffer_input_stream.read(3) assert_equal("rld", read_buffer.data.to_s) end + + def test_peek +buffer = Arrow::Buffer.new("Hello World") +buffer_input_stream = Arrow::BufferInputStream.new(buffer) +peeked_data = buffer_input_stream.peek(5) +assert_equal(buffer_input_stream.read(5).data.to_s, + peeked_data.to_s) + end end
[arrow] branch master updated: ARROW-4147: [Java] reduce heap usage for varwidth vectors (#3298)
This is an automated email from the ASF dual-hosted git repository. siddteotia pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new bfe6865 ARROW-4147: [Java] reduce heap usage for varwidth vectors (#3298) bfe6865 is described below commit bfe6865ba8087a46bd7665679e48af3a77987cef Author: Pindikura Ravindra AuthorDate: Wed Jan 9 09:11:01 2019 +0530 ARROW-4147: [Java] reduce heap usage for varwidth vectors (#3298) * ARROW-4147: reduce heap usage for varwidth vectors - some code reorg to avoid duplication - changed the default initial alloc from 4096 to 3970 * ARROW-4147: [Java] Address review comments * ARROW-4147: remove check on width to be <= 16: * ARROW-4147: allow initial valueCount to be 0. * ARROW-4147: Fix incorrect comment on initial alloc --- .../apache/arrow/vector/BaseFixedWidthVector.java | 127 ++--- .../org/apache/arrow/vector/BaseValueVector.java | 99 +++- .../arrow/vector/BaseVariableWidthVector.java | 165 +++--- .../java/org/apache/arrow/vector/BitVector.java| 5 +- .../arrow/vector/TestBufferOwnershipTransfer.java | 9 +- .../java/org/apache/arrow/vector/TestCopyFrom.java | 569 +++-- .../org/apache/arrow/vector/TestValueVector.java | 435 +--- .../org/apache/arrow/vector/TestVectorReAlloc.java | 23 +- .../vector/complex/writer/TestComplexWriter.java | 15 +- 9 files changed, 799 insertions(+), 648 deletions(-) diff --git a/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java b/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java index f69a9d1..f3c2837 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java @@ -22,7 +22,6 @@ import java.util.ArrayList; import java.util.Collections; import java.util.List; -import org.apache.arrow.memory.BaseAllocator; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.OutOfMemoryException; import org.apache.arrow.vector.ipc.message.ArrowFieldNode; @@ -43,8 +42,7 @@ public abstract class BaseFixedWidthVector extends BaseValueVector implements FixedWidthVector, FieldVector, VectorDefinitionSetter { private final int typeWidth; - protected int valueAllocationSizeInBytes; - protected int validityAllocationSizeInBytes; + protected int initialValueAllocation; protected final Field field; private int allocationMonitor; @@ -61,14 +59,7 @@ public abstract class BaseFixedWidthVector extends BaseValueVector allocationMonitor = 0; validityBuffer = allocator.getEmpty(); valueBuffer = allocator.getEmpty(); -if (typeWidth > 0) { - valueAllocationSizeInBytes = INITIAL_VALUE_ALLOCATION * typeWidth; - validityAllocationSizeInBytes = getValidityBufferSizeFromCount(INITIAL_VALUE_ALLOCATION); -} else { - /* specialized handling for BitVector */ - valueAllocationSizeInBytes = getValidityBufferSizeFromCount(INITIAL_VALUE_ALLOCATION); - validityAllocationSizeInBytes = valueAllocationSizeInBytes; -} +initialValueAllocation = INITIAL_VALUE_ALLOCATION; } @@ -159,12 +150,8 @@ public abstract class BaseFixedWidthVector extends BaseValueVector */ @Override public void setInitialCapacity(int valueCount) { -final long size = (long) valueCount * typeWidth; -if (size > MAX_ALLOCATION_SIZE) { - throw new OversizedAllocationException("Requested amount of memory is more than max allowed"); -} -valueAllocationSizeInBytes = (int) size; -validityAllocationSizeInBytes = getValidityBufferSizeFromCount(valueCount); +computeAndCheckBufferSize(valueCount); +initialValueAllocation = valueCount; } /** @@ -267,18 +254,13 @@ public abstract class BaseFixedWidthVector extends BaseValueVector */ @Override public boolean allocateNewSafe() { -long curAllocationSizeValue = valueAllocationSizeInBytes; -long curAllocationSizeValidity = validityAllocationSizeInBytes; - -if (align(curAllocationSizeValue) + curAllocationSizeValidity > MAX_ALLOCATION_SIZE) { - throw new OversizedAllocationException("Requested amount of memory exceeds limit"); -} +computeAndCheckBufferSize(initialValueAllocation); /* we are doing a new allocation -- release the current buffers */ clear(); try { - allocateBytes(curAllocationSizeValue, curAllocationSizeValidity); + allocateBytes(initialValueAllocation); } catch (Exception e) { clear(); return false; @@ -295,22 +277,13 @@ public abstract class BaseFixedWidthVector extends BaseValueVector * @throws org.apache.arrow.memory.OutOfMemoryException on error */ public void allocateNew(int valueCount) { -long
[arrow] branch master updated: ARROW-4175: [GLib] Add support for decimal compare operators
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 420c949 ARROW-4175: [GLib] Add support for decimal compare operators 420c949 is described below commit 420c949fd4e593fb0303954092b3d8a46a7aa864 Author: Yosuke Shiro AuthorDate: Wed Jan 9 09:28:03 2019 +0900 ARROW-4175: [GLib] Add support for decimal compare operators Author: Yosuke Shiro Author: Kouhei Sutou Closes #3346 from shiro615/glib-add-support-for-decimal-compare-operators and squashes the following commits: 28871fd6 Fix documents e81d4146 Unify test case comparisons 0791c4f1 Use rubyish method name 54f46039 Add a test for equal 943c2364 Rename 'more than' to 'greater than' 181e0544 Add support for decimal compare operators --- c_glib/arrow-glib/decimal128.cpp | 98 +++- c_glib/arrow-glib/decimal128.h | 15 ++ c_glib/test/test-decimal128.rb | 97 +++ 3 files changed, 209 insertions(+), 1 deletion(-) diff --git a/c_glib/arrow-glib/decimal128.cpp b/c_glib/arrow-glib/decimal128.cpp index d87a501..a49dba5 100644 --- a/c_glib/arrow-glib/decimal128.cpp +++ b/c_glib/arrow-glib/decimal128.cpp @@ -141,7 +141,8 @@ garrow_decimal128_new_integer(const gint64 data) * @decimal: A #GArrowDecimal128. * @other_decimal: A #GArrowDecimal128 to be compared. * - * Returns: %TRUE if both of them is the same value, %FALSE otherwise. + * Returns: %TRUE if the decimal is equal to the other decimal, %FALSE + * otherwise. * * Since: 0.12.0 */ @@ -155,6 +156,101 @@ garrow_decimal128_equal(GArrowDecimal128 *decimal, } /** + * garrow_decimal128_not_equal: + * @decimal: A #GArrowDecimal128. + * @other_decimal: A #GArrowDecimal128 to be compared. + * + * Returns: %TRUE if the decimal isn't equal to the other decimal, + * %FALSE otherwise. + * + * Since: 0.12.0 + */ +gboolean +garrow_decimal128_not_equal(GArrowDecimal128 *decimal, +GArrowDecimal128 *other_decimal) +{ + const auto arrow_decimal = garrow_decimal128_get_raw(decimal); + const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal); + return *arrow_decimal != *arrow_other_decimal; +} + +/** + * garrow_decimal128_less_than: + * @decimal: A #GArrowDecimal128. + * @other_decimal: A #GArrowDecimal128 to be compared. + * + * Returns: %TRUE if the decimal is less than the other decimal, + * %FALSE otherwise. + * + * Since: 0.12.0 + */ +gboolean +garrow_decimal128_less_than(GArrowDecimal128 *decimal, +GArrowDecimal128 *other_decimal) +{ + const auto arrow_decimal = garrow_decimal128_get_raw(decimal); + const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal); + return *arrow_decimal < *arrow_other_decimal; +} + +/** + * garrow_decimal128_less_than_or_equal: + * @decimal: A #GArrowDecimal128. + * @other_decimal: A #GArrowDecimal128 to be compared. + * + * Returns: %TRUE if the decimal is less than the other decimal + * or equal to the other decimal, %FALSE otherwise. + * + * Since: 0.12.0 + */ +gboolean +garrow_decimal128_less_than_or_equal(GArrowDecimal128 *decimal, + GArrowDecimal128 *other_decimal) +{ + const auto arrow_decimal = garrow_decimal128_get_raw(decimal); + const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal); + return *arrow_decimal <= *arrow_other_decimal; +} + +/** + * garrow_decimal128_greater_than: + * @decimal: A #GArrowDecimal128. + * @other_decimal: A #GArrowDecimal128 to be compared. + * + * Returns: %TRUE if the decimal is greater than the other decimal, + * %FALSE otherwise. + * + * Since: 0.12.0 + */ +gboolean +garrow_decimal128_greater_than(GArrowDecimal128 *decimal, + GArrowDecimal128 *other_decimal) +{ + const auto arrow_decimal = garrow_decimal128_get_raw(decimal); + const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal); + return *arrow_decimal > *arrow_other_decimal; +} + +/** + * garrow_decimal128_greater_than_or_equal: + * @decimal: A #GArrowDecimal128. + * @other_decimal: A #GArrowDecimal128 to be compared. + * + * Returns: %TRUE if the decimal is greater than the other decimal + * or equal to the other decimal, %FALSE otherwise. + * + * Since: 0.12.0 + */ +gboolean +garrow_decimal128_greater_than_or_equal(GArrowDecimal128 *decimal, +GArrowDecimal128 *other_decimal) +{ + const auto arrow_decimal = garrow_decimal128_get_raw(decimal); + const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal); + return *arrow_decimal >= *arrow_other_decimal; +} + +/** * garrow_decimal128_to_string_scale: * @decimal: A #GArrowDecimal128. * @scale: The scale of the decimal. diff
[arrow] branch master updated: ARROW-4184: [Ruby] Add Arrow::RecordBatch#to_table
This is an automated email from the ASF dual-hosted git repository. shiro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new a3aed3b ARROW-4184: [Ruby] Add Arrow::RecordBatch#to_table a3aed3b is described below commit a3aed3b60bd61c55d7402c4484e480f1998b99f1 Author: Kouhei Sutou AuthorDate: Wed Jan 9 09:17:46 2019 +0900 ARROW-4184: [Ruby] Add Arrow::RecordBatch#to_table Author: Kouhei Sutou Closes #3339 from kou/ruby-record-batch-to-table and squashes the following commits: a6fab35f Require gobject-introspection gem 3.3.1 or later 4a1f3564 Add Arrow::RecordBatch#to_table --- ruby/red-arrow/lib/arrow/record-batch.rb | 9 + ruby/red-arrow/red-arrow.gemspec | 2 +- ruby/red-arrow/test/test-record-batch.rb | 23 ++- 3 files changed, 24 insertions(+), 10 deletions(-) diff --git a/ruby/red-arrow/lib/arrow/record-batch.rb b/ruby/red-arrow/lib/arrow/record-batch.rb index f5f8ea2..6d9c35b 100644 --- a/ruby/red-arrow/lib/arrow/record-batch.rb +++ b/ruby/red-arrow/lib/arrow/record-batch.rb @@ -29,6 +29,15 @@ module Arrow @columns ||= columns_raw end +# Converts the record batch to {Arrow::Table}. +# +# @return [Arrow::Table] +# +# @since 0.12.0 +def to_table + Table.new(schema, [self]) +end + def respond_to_missing?(name, include_private) return true if find_column(name) super diff --git a/ruby/red-arrow/red-arrow.gemspec b/ruby/red-arrow/red-arrow.gemspec index 8e79c75..2d417f0 100644 --- a/ruby/red-arrow/red-arrow.gemspec +++ b/ruby/red-arrow/red-arrow.gemspec @@ -45,7 +45,7 @@ Gem::Specification.new do |spec| spec.test_files += Dir.glob("test/**/*") spec.extensions = ["dependency-check/Rakefile"] - spec.add_runtime_dependency("gobject-introspection", ">= 3.1.1") + spec.add_runtime_dependency("gobject-introspection", ">= 3.3.1") spec.add_runtime_dependency("pkg-config") spec.add_runtime_dependency("native-package-installer") diff --git a/ruby/red-arrow/test/test-record-batch.rb b/ruby/red-arrow/test/test-record-batch.rb index 994b16d..4dac085 100644 --- a/ruby/red-arrow/test/test-record-batch.rb +++ b/ruby/red-arrow/test/test-record-batch.rb @@ -16,16 +16,16 @@ # under the License. class RecordBatchTest < Test::Unit::TestCase - sub_test_case(".each") do -setup do - fields = [ -Arrow::Field.new("count", :uint32), - ] - @schema = Arrow::Schema.new(fields) - @counts = Arrow::UInt32Array.new([1, 2, 4, 8]) - @record_batch = Arrow::RecordBatch.new(@schema, @counts.length, [@counts]) -end + setup do +fields = [ + Arrow::Field.new("count", :uint32), +] +@schema = Arrow::Schema.new(fields) +@counts = Arrow::UInt32Array.new([1, 2, 4, 8]) +@record_batch = Arrow::RecordBatch.new(@schema, @counts.length, [@counts]) + end + sub_test_case(".each") do test("default") do records = [] @record_batch.each do |record| @@ -54,4 +54,9 @@ class RecordBatchTest < Test::Unit::TestCase records.collect {|record, i| [record.index, i]}) end end + + test("#to_table") do +assert_equal(Arrow::Table.new(@schema, [@counts]), + @record_batch.to_table) + end end
[arrow] branch master updated: ARROW-4172: [Rust] more consistent naming in array builders
This is an automated email from the ASF dual-hosted git repository. agrove pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new bcca04a ARROW-4172: [Rust] more consistent naming in array builders bcca04a is described below commit bcca04aabd804263c555945463f5cf4a2ab6216f Author: Chao Sun AuthorDate: Tue Jan 8 16:56:31 2019 -0700 ARROW-4172: [Rust] more consistent naming in array builders This is to make the namings in `builder.rs` more consistent: 1. Changes `PrimitiveArrayBuilder` to `PrimitiveBuilder`, similarly for `ListArrayBuilder`, `BinaryArrayBuilder` and `StructArrayBuilder`. The `Array` seems redundant. 2. Currently we use both `push` and `append`, which is a bit confusing. This unifies them by using `append`. Author: Chao Sun Closes #3345 from sunchao/ARROW-4172 and squashes the following commits: 3472d12 ARROW-4172: more consistent naming in array builders --- rust/arrow/examples/builders.rs | 12 +- rust/arrow/src/array.rs | 4 +- rust/arrow/src/array_ops.rs | 22 +-- rust/arrow/src/builder.rs | 368 rust/arrow/src/csv/reader.rs| 10 +- rust/arrow/src/tensor.rs| 12 +- 6 files changed, 214 insertions(+), 214 deletions(-) diff --git a/rust/arrow/examples/builders.rs b/rust/arrow/examples/builders.rs index 92f45ce..f9ba297 100644 --- a/rust/arrow/examples/builders.rs +++ b/rust/arrow/examples/builders.rs @@ -29,14 +29,14 @@ fn main() { // Create a new builder with a capacity of 100 let mut primitive_array_builder = Int32Builder::new(100); -// Push an individual primitive value -primitive_array_builder.push(55).unwrap(); +// Append an individual primitive value +primitive_array_builder.append_value(55).unwrap(); -// Push a null value -primitive_array_builder.push_null().unwrap(); +// Append a null value +primitive_array_builder.append_null().unwrap(); -// Push a slice of primitive values -primitive_array_builder.push_slice(&[39, 89, 12]).unwrap(); +// Append a slice of primitive values +primitive_array_builder.append_slice(&[39, 89, 12]).unwrap(); // Build the `PrimitiveArray` let _primitive_array = primitive_array_builder.finish(); diff --git a/rust/arrow/src/array.rs b/rust/arrow/src/array.rs index f8272eb..78910d5 100644 --- a/rust/arrow/src/array.rs +++ b/rust/arrow/src/array.rs @@ -201,8 +201,8 @@ impl PrimitiveArray { } // Returns a new primitive array builder -pub fn builder(capacity: usize) -> PrimitiveArrayBuilder { -PrimitiveArrayBuildernew(capacity) +pub fn builder(capacity: usize) -> PrimitiveBuilder { +PrimitiveBuildernew(capacity) } } diff --git a/rust/arrow/src/array_ops.rs b/rust/arrow/src/array_ops.rs index 6963709..f41740a 100644 --- a/rust/arrow/src/array_ops.rs +++ b/rust/arrow/src/array_ops.rs @@ -22,7 +22,7 @@ use std::ops::{Add, Div, Mul, Sub}; use num::Zero; use crate::array::{Array, BooleanArray, PrimitiveArray}; -use crate::builder::PrimitiveArrayBuilder; +use crate::builder::PrimitiveBuilder; use crate::datatypes; use crate::datatypes::ArrowNumericType; use crate::error::{ArrowError, Result}; @@ -102,13 +102,13 @@ where "Cannot perform math operation on arrays of different length".to_string(), )); } -let mut b = PrimitiveArrayBuildernew(left.len()); +let mut b = PrimitiveBuildernew(left.len()); for i in 0..left.len() { let index = i; if left.is_null(i) || right.is_null(i) { -b.push_null()?; +b.append_null()?; } else { -b.push(op(left.value(index), right.value(index))?)?; +b.append_value(op(left.value(index), right.value(index))?)?; } } Ok(b.finish()) @@ -276,7 +276,7 @@ where } else { Some(right.value(index)) }; -b.push(op(l, r))?; +b.append_value(op(l, r))?; } Ok(b.finish()) } @@ -291,9 +291,9 @@ pub fn and(left: , right: ) -> Result { let mut b = BooleanArray::builder(left.len()); for i in 0..left.len() { if left.is_null(i) || right.is_null(i) { -b.push_null()?; +b.append_null()?; } else { -b.push(left.value(i) && right.value(i))?; +b.append_value(left.value(i) && right.value(i))?; } } Ok(b.finish()) @@ -309,9 +309,9 @@ pub fn or(left: , right: ) -> Result { let mut b = BooleanArray::builder(left.len()); for i in 0..left.len() { if left.is_null(i) || right.is_null(i) { -b.push_null()?; +b.append_null()?; } else { -b.push(left.value(i) || right.value(i))?; +b.append_value(left.value(i) ||
[arrow] branch master updated: ARROW-3839: [Rust] Add ability to infer schema in CSV reader
This is an automated email from the ASF dual-hosted git repository. agrove pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new ac45f32 ARROW-3839: [Rust] Add ability to infer schema in CSV reader ac45f32 is described below commit ac45f3210a194049ef35f49847dbc4ff5e70d48f Author: Neville Dipale AuthorDate: Tue Jan 8 16:49:12 2019 -0700 ARROW-3839: [Rust] Add ability to infer schema in CSV reader Resubmission of #3128 Author: Neville Dipale Closes #3349 from nevi-me/rust/infer-csv-schema and squashes the following commits: 0838199 ARROW-3839: Add ability to infer schema in CSV reader --- ci/rust-build-main.bat | 1 + ci/travis_script_rust.sh| 1 + rust/arrow/Cargo.toml | 2 + rust/arrow/examples/read_csv_infer_schema.rs| 66 + rust/arrow/src/csv/mod.rs | 1 + rust/arrow/src/csv/reader.rs| 373 +++- rust/arrow/src/datatypes.rs | 4 +- rust/arrow/src/error.rs | 37 +++ rust/arrow/test/data/uk_cities_with_headers.csv | 38 +++ rust/arrow/test/data/various_types.csv | 6 + 10 files changed, 524 insertions(+), 5 deletions(-) diff --git a/ci/rust-build-main.bat b/ci/rust-build-main.bat index ac5c9e7..b36a97a 100644 --- a/ci/rust-build-main.bat +++ b/ci/rust-build-main.bat @@ -40,5 +40,6 @@ cd arrow cargo run --example builders --target %TARGET% --release || exit /B cargo run --example dynamic_types --target %TARGET% --release || exit /B cargo run --example read_csv --target %TARGET% --release || exit /B +cargo run --example read_csv_infer_schema --target %TARGET% --release || exit /B popd diff --git a/ci/travis_script_rust.sh b/ci/travis_script_rust.sh index 8e3c8c3..c25d64e 100755 --- a/ci/travis_script_rust.sh +++ b/ci/travis_script_rust.sh @@ -39,5 +39,6 @@ cd arrow cargo run --example builders cargo run --example dynamic_types cargo run --example read_csv +cargo run --example read_csv_infer_schema popd diff --git a/rust/arrow/Cargo.toml b/rust/arrow/Cargo.toml index 77e8d53..38e7e5e 100644 --- a/rust/arrow/Cargo.toml +++ b/rust/arrow/Cargo.toml @@ -43,6 +43,8 @@ serde_json = "1.0.13" rand = "0.5" csv = "1.0.0" num = "0.2" +regex = "1.1" +lazy_static = "1.2" [dev-dependencies] criterion = "0.2" diff --git a/rust/arrow/examples/read_csv_infer_schema.rs b/rust/arrow/examples/read_csv_infer_schema.rs new file mode 100644 index 000..9dd2d2a --- /dev/null +++ b/rust/arrow/examples/read_csv_infer_schema.rs @@ -0,0 +1,66 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +extern crate arrow; + +use arrow::array::{BinaryArray, Float64Array}; +use arrow::csv; +use std::fs::File; + +fn main() { +let file = File::open("test/data/uk_cities_with_headers.csv").unwrap(); +let builder = csv::ReaderBuilder::new() +.has_headers(true) +.infer_schema(Some(100)); +let mut csv = builder.build(file).unwrap(); +let batch = csv.next().unwrap().unwrap(); + +println!( +"Loaded {} rows containing {} columns", +batch.num_rows(), +batch.num_columns() +); + +println!("Inferred schema: {:?}", batch.schema()); + +let city = batch +.column(0) +.as_any() +.downcast_ref::() +.unwrap(); +let lat = batch +.column(1) +.as_any() +.downcast_ref::() +.unwrap(); +let lng = batch +.column(2) +.as_any() +.downcast_ref::() +.unwrap(); + +for i in 0..batch.num_rows() { +let city_name: String = String::from_utf8(city.value(i).to_vec()).unwrap(); + +println!( +"City: {}, Latitude: {}, Longitude: {}", +city_name, +lat.value(i), +lng.value(i) +); +} +} diff --git a/rust/arrow/src/csv/mod.rs b/rust/arrow/src/csv/mod.rs index 9f2bd1d..6521b19 100644 --- a/rust/arrow/src/csv/mod.rs +++ b/rust/arrow/src/csv/mod.rs @@ -18,3 +18,4 @@ pub mod reader;
[arrow] branch master updated: ARROW-4186: [C++] BitmapWriter shouldn't clobber data when length == 0
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 326015c ARROW-4186: [C++] BitmapWriter shouldn't clobber data when length == 0 326015c is described below commit 326015cfc66e1f657cdd6811620137e9e277b43d Author: Antoine Pitrou AuthorDate: Tue Jan 8 10:17:54 2019 -0600 ARROW-4186: [C++] BitmapWriter shouldn't clobber data when length == 0 Author: Antoine Pitrou Closes #3348 from pitrou/ARROW-4186-bitmap-writer-zero-length and squashes the following commits: 2299b0906 ARROW-4186: BitmapWriter shouldn't clobber data when length == 0 --- cpp/src/arrow/util/bit-util-test.cc | 79 ++--- cpp/src/arrow/util/bit-util.h | 4 +- 2 files changed, 50 insertions(+), 33 deletions(-) diff --git a/cpp/src/arrow/util/bit-util-test.cc b/cpp/src/arrow/util/bit-util-test.cc index b12e2ec..174e6d0 100644 --- a/cpp/src/arrow/util/bit-util-test.cc +++ b/cpp/src/arrow/util/bit-util-test.cc @@ -21,7 +21,6 @@ #include #include #include -#include #include #include @@ -167,33 +166,40 @@ TEST(BitmapReader, DoesNotReadOutOfBounds) { } TEST(BitmapWriter, NormalOperation) { - { -uint8_t bitmap[] = {0, 0, 0, 0}; -auto writer = internal::BitmapWriter(bitmap, 0, 12); -WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1}); -// {0b00110110, 0b1010, 0, 0} -ASSERT_BYTES_EQ(bitmap, {0x36, 0x0a, 0, 0}); - } - { -uint8_t bitmap[] = {0xff, 0xff, 0xff, 0xff}; -auto writer = internal::BitmapWriter(bitmap, 0, 12); -WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1}); -// {0b00110110, 0b1010, 0xff, 0xff} -ASSERT_BYTES_EQ(bitmap, {0x36, 0xfa, 0xff, 0xff}); - } - { -uint8_t bitmap[] = {0, 0, 0, 0}; -auto writer = internal::BitmapWriter(bitmap, 3, 12); -WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1}); -// {0b1011, 0b01010001, 0, 0} -ASSERT_BYTES_EQ(bitmap, {0xb0, 0x51, 0, 0}); - } - { -uint8_t bitmap[] = {0, 0, 0, 0}; -auto writer = internal::BitmapWriter(bitmap, 20, 12); -WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1}); -// {0, 0, 0b0110, 0b10100011} -ASSERT_BYTES_EQ(bitmap, {0, 0, 0x60, 0xa3}); + for (const auto fill_byte_int : {0x00, 0xff}) { +const uint8_t fill_byte = static_cast(fill_byte_int); +{ + uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte}; + auto writer = internal::BitmapWriter(bitmap, 0, 12); + WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1}); + // {0b00110110, 0b1010, , } + ASSERT_BYTES_EQ(bitmap, {0x36, static_cast(0x0a | (fill_byte & 0xf0)), + fill_byte, fill_byte}); +} +{ + uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte}; + auto writer = internal::BitmapWriter(bitmap, 3, 12); + WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1}); + // {0b10110..., 0b.1010001, , } + ASSERT_BYTES_EQ(bitmap, {static_cast(0xb0 | (fill_byte & 0x07)), + static_cast(0x51 | (fill_byte & 0x80)), fill_byte, + fill_byte}); +} +{ + uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte}; + auto writer = internal::BitmapWriter(bitmap, 20, 12); + WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1}); + // {, , 0b0110, 0b10100011} + ASSERT_BYTES_EQ(bitmap, {fill_byte, fill_byte, + static_cast(0x60 | (fill_byte & 0x0f)), 0xa3}); +} +// 0-length writes +for (int64_t pos = 0; pos < 32; ++pos) { + uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte}; + auto writer = internal::BitmapWriter(bitmap, pos, 0); + WriteVectorToWriter(writer, {}); + ASSERT_BYTES_EQ(bitmap, {fill_byte, fill_byte, fill_byte, fill_byte}); +} } } @@ -267,6 +273,10 @@ TEST(FirstTimeBitmapWriter, NormalOperation) { { uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte}; { +auto writer = internal::FirstTimeBitmapWriter(bitmap, 4, 0); +WriteVectorToWriter(writer, {}); + } + { auto writer = internal::FirstTimeBitmapWriter(bitmap, 4, 6); WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1}); } @@ -275,6 +285,10 @@ TEST(FirstTimeBitmapWriter, NormalOperation) { WriteVectorToWriter(writer, {0, 0, 0}); } { +auto writer = internal::FirstTimeBitmapWriter(bitmap, 13, 0); +WriteVectorToWriter(writer,
[arrow] branch master updated: ARROW-4191: [C++] Use same CC and AR for jemalloc as for the main sources
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new ccec638 ARROW-4191: [C++] Use same CC and AR for jemalloc as for the main sources ccec638 is described below commit ccec63847e7709317a18036931ef3e3fbeab1f05 Author: Korn, Uwe AuthorDate: Tue Jan 8 10:14:53 2019 -0600 ARROW-4191: [C++] Use same CC and AR for jemalloc as for the main sources Author: Korn, Uwe Closes #3347 from xhochy/ARROW-4191 and squashes the following commits: 44df02a23 ARROW-4191: Use same CC and AR for jemalloc as for the main sources --- cpp/cmake_modules/ThirdpartyToolchain.cmake | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index d8b3486..5a8c28f 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -772,7 +772,7 @@ if (ARROW_JEMALLOC) ExternalProject_Add(jemalloc_ep URL ${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/jemalloc/${JEMALLOC_VERSION}.tar.gz PATCH_COMMAND touch doc/jemalloc.3 doc/jemalloc.html -CONFIGURE_COMMAND ./autogen.sh "--prefix=${JEMALLOC_PREFIX}" "--with-jemalloc-prefix=je_arrow_" "--with-private-namespace=je_arrow_private_" "--disable-tls" +CONFIGURE_COMMAND ./autogen.sh "AR=${CMAKE_AR}" "CC=${CMAKE_C_COMPILER}" "--prefix=${JEMALLOC_PREFIX}" "--with-jemalloc-prefix=je_arrow_" "--with-private-namespace=je_arrow_private_" "--disable-tls" ${EP_LOG_OPTIONS} BUILD_IN_SOURCE 1 BUILD_COMMAND ${MAKE} ${MAKE_BUILD_ARGS}
[arrow] branch master updated: ARROW-4060: [Rust] Add parquet arrow converter.
This is an automated email from the ASF dual-hosted git repository. agrove pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new af07f75 ARROW-4060: [Rust] Add parquet arrow converter. af07f75 is described below commit af07f75c1f692d1ed4cea93d358ff1acda6a1771 Author: Renjie Liu AuthorDate: Tue Jan 8 06:45:13 2019 -0700 ARROW-4060: [Rust] Add parquet arrow converter. This is the first step of adding an arrow reader and writer for parquet-rs. This commit contains a converter which converts parquet schema to arrow schema. Copied from this pr https://github.com/sunchao/parquet-rs/pull/185. Author: Renjie Liu Closes #3279 from liurenjie1024/rust-arrow-schema-converter and squashes the following commits: 1bfa00f Resolve conflict 8806b16 Add parquet arrow converter --- rust/parquet/src/errors.rs | 6 + rust/parquet/src/lib.rs| 1 + rust/parquet/src/{lib.rs => reader/mod.rs} | 28 +- rust/parquet/src/reader/schema.rs | 779 + rust/parquet/src/schema/types.rs | 14 +- 5 files changed, 805 insertions(+), 23 deletions(-) diff --git a/rust/parquet/src/errors.rs b/rust/parquet/src/errors.rs index a5532c1..abfbda9 100644 --- a/rust/parquet/src/errors.rs +++ b/rust/parquet/src/errors.rs @@ -50,6 +50,12 @@ quick_error! { display("EOF: {}", message) description(message) } + /// Arrow error. + /// Returned when reading into arrow or writing from arrow. + ArrowError(message: String) { + display("Arrow: {}", message) + description(message) + } } } diff --git a/rust/parquet/src/lib.rs b/rust/parquet/src/lib.rs index 75c56f5..cad85ec 100644 --- a/rust/parquet/src/lib.rs +++ b/rust/parquet/src/lib.rs @@ -37,5 +37,6 @@ pub mod column; pub mod compression; mod encodings; pub mod file; +pub mod reader; pub mod record; pub mod schema; diff --git a/rust/parquet/src/lib.rs b/rust/parquet/src/reader/mod.rs similarity index 64% copy from rust/parquet/src/lib.rs copy to rust/parquet/src/reader/mod.rs index 75c56f5..fe580c5 100644 --- a/rust/parquet/src/lib.rs +++ b/rust/parquet/src/reader/mod.rs @@ -15,27 +15,11 @@ // specific language governing permissions and limitations // under the License. -#![feature(type_ascription)] -#![feature(rustc_private)] -#![feature(specialization)] -#![feature(try_from)] -#![allow(dead_code)] -#![allow(non_camel_case_types)] +//! [Apache Arrow](http://arrow.apache.org/) is a cross-language development platform for +//! in-memory data. +//! +//! This mod provides API for converting between arrow and parquet. -#[macro_use] -pub mod errors; -pub mod basic; -pub mod data_type; - -// Exported for external use, such as benchmarks -pub use self::encodings::{decoding, encoding}; -pub use self::util::memory; - -#[macro_use] -mod util; -pub mod column; -pub mod compression; -mod encodings; -pub mod file; -pub mod record; pub mod schema; + +pub use self::schema::{parquet_to_arrow_schema, parquet_to_arrow_schema_by_columns}; diff --git a/rust/parquet/src/reader/schema.rs b/rust/parquet/src/reader/schema.rs new file mode 100644 index 000..68fd867 --- /dev/null +++ b/rust/parquet/src/reader/schema.rs @@ -0,0 +1,779 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Provides API for converting parquet schema to arrow schema and vice versa. +//! +//! The main interfaces for converting parquet schema to arrow schema are +//! `parquet_to_arrow_schema` and `parquet_to_arrow_schema_by_columns`. +//! +//! The interfaces for converting arrow schema to parquet schema is coming. + +use std::{collections::HashSet, rc::Rc}; + +use crate::basic::{LogicalType, Repetition, Type as PhysicalType}; +use crate::errors::{ParquetError::ArrowError, Result}; +use crate::schema::types::{SchemaDescPtr, Type, TypePtr}; + +use arrow::datatypes::{DataType, Field, Schema}; + +/// Convert parquet schema to arrow schema. +pub fn parquet_to_arrow_schema(parquet_schema: SchemaDescPtr) -> Result { +
[arrow] branch master updated: ARROW-4183: [Ruby] Add Arrow::Struct as an element of Arrow::StructArray
This is an automated email from the ASF dual-hosted git repository. shiro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 8704f8b ARROW-4183: [Ruby] Add Arrow::Struct as an element of Arrow::StructArray 8704f8b is described below commit 8704f8bd98f1edcf1f9ecc51d6fb3b4b5b4ecb88 Author: Kouhei Sutou AuthorDate: Tue Jan 8 22:32:13 2019 +0900 ARROW-4183: [Ruby] Add Arrow::Struct as an element of Arrow::StructArray Returning Arrow::Array by Arrow::StructArray#[] is deprecated. It'll return Arrow::Struct in the next release. It's for consistency. All Arrow::Array#[] implementations should return an element. Author: Kouhei Sutou Closes #3338 from kou/ruby-struct and squashes the following commits: a0561954 Add Arrow::Struct as an element of Arrow::StructArray --- ruby/red-arrow/lib/arrow/struct-array-builder.rb | 9 ++- ruby/red-arrow/lib/arrow/struct-array.rb | 34 ++ ruby/red-arrow/lib/arrow/struct.rb | 68 ruby/red-arrow/test/test-struct-array-builder.rb | 47 +- ruby/red-arrow/test/test-struct-array.rb | 58 - ruby/red-arrow/test/test-struct.rb | 81 6 files changed, 263 insertions(+), 34 deletions(-) diff --git a/ruby/red-arrow/lib/arrow/struct-array-builder.rb b/ruby/red-arrow/lib/arrow/struct-array-builder.rb index 883ce84..52f75aa 100644 --- a/ruby/red-arrow/lib/arrow/struct-array-builder.rb +++ b/ruby/red-arrow/lib/arrow/struct-array-builder.rb @@ -73,13 +73,20 @@ module Arrow value.each_with_index do |sub_value, i| self[i].append_value(sub_value) end +when Arrow::Struct + append_value_raw + value.values.each_with_index do |sub_value, i| +self[i].append_value(sub_value) + end when Hash append_value_raw value.each do |name, sub_value| self[name].append_value(sub_value) end else - message = "struct value must be nil, Array or Hash: #{value.inspect}" + message = +"struct value must be nil, Array, " + +"Arrow::Struct or Hash: #{value.inspect}" raise ArgumentError, message end else diff --git a/ruby/red-arrow/lib/arrow/struct-array.rb b/ruby/red-arrow/lib/arrow/struct-array.rb index 4f9834c..e55a507 100644 --- a/ruby/red-arrow/lib/arrow/struct-array.rb +++ b/ruby/red-arrow/lib/arrow/struct-array.rb @@ -15,10 +15,44 @@ # specific language governing permissions and limitations # under the License. +require "arrow/struct" + module Arrow class StructArray def [](i) + warn("Use #{self.class}\#find_field instead. " + + "This will returns Arrow::Struct instead of Arrow::Array " + + "since 0.13.0.") get_field(i) end + +def get_value(i) + Struct.new(self, i) +end + +def find_field(index_or_name) + case index_or_name + when String, Symbol +name = index_or_name +(@name_to_field ||= build_name_to_field)[name.to_s] + else +index = index_or_name +cached_fields[index] + end +end + +private +def cached_fields + @fields ||= fields +end + +def build_name_to_field + name_to_field = {} + field_arrays = cached_fields + value_data_type.fields.each_with_index do |field, i| +name_to_field[field.name] = field_arrays[i] + end + name_to_field +end end end diff --git a/ruby/red-arrow/lib/arrow/struct.rb b/ruby/red-arrow/lib/arrow/struct.rb new file mode 100644 index 000..4ae12b8 --- /dev/null +++ b/ruby/red-arrow/lib/arrow/struct.rb @@ -0,0 +1,68 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +module Arrow + class Struct +attr_accessor :index +def initialize(array, index) + @array = array + @index = index +end + +def [](field_name_or_field_index) + field = @array.find_field(field_name_or_field_index) + return nil if field.nil? + field[@index]
[arrow] branch master updated: ARROW-4104: [Java] fix a race condition in AllocationManager (#3246)
This is an automated email from the ASF dual-hosted git repository. siddteotia pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 55848a3 ARROW-4104: [Java] fix a race condition in AllocationManager (#3246) 55848a3 is described below commit 55848a36edb5ea5e0765068ef5f09d07d09d4898 Author: Pindikura Ravindra AuthorDate: Tue Jan 8 16:13:18 2019 +0530 ARROW-4104: [Java] fix a race condition in AllocationManager (#3246) --- .../src/main/java/org/apache/arrow/memory/AllocationManager.java| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java b/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java index 687674f..c10d246 100644 --- a/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java +++ b/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java @@ -230,7 +230,7 @@ public class AllocationManager { // since two balance transfers out from the allocator manager could cause incorrect // accounting, we need to ensure // that this won't happen by synchronizing on the allocator manager instance. - synchronized (this) { + synchronized (AllocationManager.this) { if (owningLedger != this) { return true; } @@ -310,7 +310,7 @@ public class AllocationManager { allocator.assertOpen(); final int outcome; - synchronized (this) { + synchronized (AllocationManager.this) { outcome = bufRefCnt.addAndGet(-decrement); if (outcome == 0) { lDestructionTime = System.nanoTime(); @@ -411,7 +411,7 @@ public class AllocationManager { * @return Amount of accounted(owned) memory associated with this ledger. */ public int getAccountedSize() { - synchronized (this) { + synchronized (AllocationManager.this) { if (owningLedger == this) { return size; } else {
[arrow] branch master updated: ARROW-4188: [Rust] Move Rust README to top level rust directory
This is an automated email from the ASF dual-hosted git repository. kszucs pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 2057859 ARROW-4188: [Rust] Move Rust README to top level rust directory 2057859 is described below commit 2057859744cb2ada93fc97838e09eb954963dc00 Author: Andy Grove AuthorDate: Tue Jan 8 11:03:17 2019 +0100 ARROW-4188: [Rust] Move Rust README to top level rust directory Author: Andy Grove Closes #3342 from andygrove/ARROW-4188 and squashes the following commits: fedcd7bc split README between top level and arrow level b68f77cb Merge branch 'master' into ARROW-4188 e6dbd87f add badges back f2ee7e05 Move Rust README to top level rust directory --- rust/README.md | 50 ++ rust/arrow/README.md | 22 -- 2 files changed, 50 insertions(+), 22 deletions(-) diff --git a/rust/README.md b/rust/README.md new file mode 100644 index 000..8fe7885 --- /dev/null +++ b/rust/README.md @@ -0,0 +1,50 @@ + + +# Native Rust implementation of Apache Arrow + +## The Rust implementation of Arrow consists of the following crates + +- Arrow [(README)](arrow/README.md) +- Parquet [(README)](parquet/README.md) + +## Run Tests + +Parquet support in Arrow requires data to test against, this data is in a +git submodule. To pull down this data run the following: + +```bash +git submodule update --init +``` + +The data can then be found in `cpp/submodules/parquet_testing/data`. +Create a new environment variable called `PARQUET_TEST_DATA` to point +to this location and then `cargo test` as usual. + +## Code Formatting + +Our CI uses `rustfmt` to check code formatting. Although the project is +built and tested against nightly rust we use the stable version of +`rustfmt`. So before submitting a PR be sure to run the following +and check for lint issues: + +```bash +cargo +stable fmt --all -- --check +``` + diff --git a/rust/arrow/README.md b/rust/arrow/README.md index cbfd4dd..9df2dd2 100644 --- a/rust/arrow/README.md +++ b/rust/arrow/README.md @@ -57,28 +57,6 @@ cargo run --example dynamic_types cargo run --example read_csv ``` -## Run Tests - -Parquet support in Arrow requires data to test against, this data is in a -git submodule. To pull down this data run the following: - -```bash -git submodule update --init -``` - -The data can then be found in `cpp/submodules/parquet_testing/data`. -Create a new environment variable called `PARQUET_TEST_DATA` to point -to this location and then `cargo test` as usual. - -Our CI uses `rustfmt` to check code formatting. Although the project is -built and tested against nightly rust we use the stable version of -`rustfmt`. So before submitting a PR be sure to run the following -and check for lint issues: - -```bash -cargo +stable fmt --all -- --check -``` - # Publishing to crates.io An Arrow committer can publish this crate after an official project release has