date:20190108

[arrow] branch master updated: ARROW-4200: [C++/Python] Enable conda_env_python.yml to work on Windows, simplify python/development.rst

2019-01-08 Thread wesm

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 090a8c0  ARROW-4200: [C++/Python] Enable conda_env_python.yml to work 
on Windows, simplify python/development.rst
090a8c0 is described below

commit 090a8c020611b2f75ec0e36d765cc6d48adbe9a7
Author: Wes McKinney 
AuthorDate: Tue Jan 8 22:59:00 2019 -0600

ARROW-4200: [C++/Python] Enable conda_env_python.yml to work on Windows, 
simplify python/development.rst

I also removed nomkl from conda_env_python.yml. It's sort of a developer 
decision whether or not they want to install the MKL -- we shouldn't force them 
to _not_ have it

Author: Wes McKinney 

Closes #3353 from wesm/ARROW-4200 and squashes the following commits:

4849a326d  Accept bkietz suggestions
576e63b27  Also add nomkl to python/Dockerfile
9b39e8300  Get conda env files working on Windows, small 
cleaning to Python development instructions
---
 ci/conda_env_python.yml|  2 --
 ci/conda_env_unix.yml  |  1 +
 ci/travis_script_python.sh |  1 +
 docs/source/python/development.rst | 23 +++
 python/Dockerfile  |  1 +
 5 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/ci/conda_env_python.yml b/ci/conda_env_python.yml
index d3756cb..b51f5c3 100644
--- a/ci/conda_env_python.yml
+++ b/ci/conda_env_python.yml
@@ -18,10 +18,8 @@
 cython
 cloudpickle
 hypothesis
-nomkl
 numpy
 pandas
 pytest
-rsync
 setuptools
 setuptools_scm
diff --git a/ci/conda_env_unix.yml b/ci/conda_env_unix.yml
index eeb90e4..9ecf549 100644
--- a/ci/conda_env_unix.yml
+++ b/ci/conda_env_unix.yml
@@ -18,3 +18,4 @@
 # conda package dependencies specific to Unix-like environments (Linux and 
macOS)
 
 autoconf
+rsync
diff --git a/ci/travis_script_python.sh b/ci/travis_script_python.sh
index 69e115a..e9a1122 100755
--- a/ci/travis_script_python.sh
+++ b/ci/travis_script_python.sh
@@ -47,6 +47,7 @@ fi
 
 conda create -y -q -p $CONDA_ENV_DIR \
   --file $TRAVIS_BUILD_DIR/ci/conda_env_python.yml \
+  nomkl \
   cmake \
   pip \
   numpy=1.13.1 \
diff --git a/docs/source/python/development.rst 
b/docs/source/python/development.rst
index 0bc1c62..d855371 100644
--- a/docs/source/python/development.rst
+++ b/docs/source/python/development.rst
@@ -86,18 +86,9 @@ On Linux and OSX:
 --file arrow/ci/conda_env_python.yml \
 python=3.6
 
-   source activate pyarrow-dev
+   conda activate pyarrow-dev
 
-On Windows:
-
-.. code-block:: shell
-
-conda create -y -n pyarrow-dev -c conda-forge ^
---file arrow\ci\conda_env_cpp.yml ^
---file arrow\ci\conda_env_python.yml ^
-python=3.6
-
-   activate pyarrow-dev
+For Windows, see the `Developing on Windows`_ section below.
 
 We need to set some environment variables to let Arrow's build system know
 about our build toolchain:
@@ -310,11 +301,11 @@ First, starting from fresh clones of Apache Arrow:
 
 .. code-block:: shell
 
-   conda create -y -q -n pyarrow-dev ^
- python=3.6 numpy six setuptools cython pandas pytest ^
- cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib ^
- gflags brotli lz4-c zstd -c conda-forge
-   activate pyarrow-dev
+conda create -y -n pyarrow-dev -c conda-forge ^
+--file arrow\ci\conda_env_cpp.yml ^
+--file arrow\ci\conda_env_python.yml ^
+python=3.7
+   conda activate pyarrow-dev
 
 Now, we build and install Arrow C++ libraries
 
diff --git a/python/Dockerfile b/python/Dockerfile
index a99a420..ecabc94 100644
--- a/python/Dockerfile
+++ b/python/Dockerfile
@@ -21,6 +21,7 @@ FROM arrow:cpp
 ARG PYTHON_VERSION=3.6
 ADD ci/conda_env_python.yml /arrow/ci/
 RUN conda install -c conda-forge \
+nomkl \
 --file arrow/ci/conda_env_python.yml \
 python=$PYTHON_VERSION && \
 conda clean --all

[arrow] branch master updated: ARROW-4199: [GLib] Add garrow_seekable_input_stream_peek()

2019-01-08 Thread kou

This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new cec7541  ARROW-4199: [GLib] Add garrow_seekable_input_stream_peek()
cec7541 is described below

commit cec75410b78b70b30bd57908d920c006d9101b72
Author: Yosuke Shiro 
AuthorDate: Wed Jan 9 13:35:05 2019 +0900

ARROW-4199: [GLib] Add garrow_seekable_input_stream_peek()

Author: Yosuke Shiro 
Author: Kouhei Sutou 

Closes #3351 from shiro615/glib-support-peek and squashes the following 
commits:

1f445764  Improve document
a5f0fdfd  Add GARROW_AVAILABLE_IN_0_12
b27c0a04  Use g_bytes_new_static to avoid copying the data
f9d9f237   Add support for Peek to InputStream
---
 c_glib/arrow-glib/input-stream.cpp  | 24 
 c_glib/arrow-glib/input-stream.h|  3 +++
 c_glib/test/test-buffer-input-stream.rb |  8 
 3 files changed, 35 insertions(+)

diff --git a/c_glib/arrow-glib/input-stream.cpp 
b/c_glib/arrow-glib/input-stream.cpp
index cb36e49..cb1fb3b 100644
--- a/c_glib/arrow-glib/input-stream.cpp
+++ b/c_glib/arrow-glib/input-stream.cpp
@@ -325,6 +325,30 @@ 
garrow_seekable_input_stream_read_at(GArrowSeekableInputStream *input_stream,
 }
 
 
+/**
+ * garrow_seekable_input_stream_peek:
+ * @input_stream: A #GArrowSeekableInputStream.
+ * @n_bytes: The number of bytes to be peeked.
+ *
+ * Returns: (transfer full): The data of the buffer, up to the
+ *   indicated number. The data becomes invalid after any operation on
+ *   the stream. If the stream is unbuffered, the data is empty.
+ *
+ *   It should be freed with g_bytes_unref() when no longer needed.
+ *
+ * Since: 0.12.0
+ */
+GBytes *
+garrow_seekable_input_stream_peek(GArrowSeekableInputStream *input_stream,
+  gint64 n_bytes)
+{
+  auto arrow_random_access_file =
+garrow_seekable_input_stream_get_raw(input_stream);
+  auto string_view = arrow_random_access_file->Peek(n_bytes);
+  return g_bytes_new_static(string_view.data(), string_view.size());
+}
+
+
 typedef struct GArrowBufferInputStreamPrivate_ {
   GArrowBuffer *buffer;
 } GArrowBufferInputStreamPrivate;
diff --git a/c_glib/arrow-glib/input-stream.h b/c_glib/arrow-glib/input-stream.h
index 9deebd7..745b912 100644
--- a/c_glib/arrow-glib/input-stream.h
+++ b/c_glib/arrow-glib/input-stream.h
@@ -66,6 +66,9 @@ GArrowBuffer 
*garrow_seekable_input_stream_read_at(GArrowSeekableInputStream *in
gint64 position,
gint64 n_bytes,
GError **error);
+GARROW_AVAILABLE_IN_0_12
+GBytes *garrow_seekable_input_stream_peek(GArrowSeekableInputStream 
*input_stream,
+  gint64 n_bytes);
 
 
 #define GARROW_TYPE_BUFFER_INPUT_STREAM \
diff --git a/c_glib/test/test-buffer-input-stream.rb 
b/c_glib/test/test-buffer-input-stream.rb
index f5a0132..cb6a667 100644
--- a/c_glib/test/test-buffer-input-stream.rb
+++ b/c_glib/test/test-buffer-input-stream.rb
@@ -39,4 +39,12 @@ class TestBufferInputStream < Test::Unit::TestCase
 read_buffer = buffer_input_stream.read(3)
 assert_equal("rld", read_buffer.data.to_s)
   end
+
+  def test_peek
+buffer = Arrow::Buffer.new("Hello World")
+buffer_input_stream = Arrow::BufferInputStream.new(buffer)
+peeked_data = buffer_input_stream.peek(5)
+assert_equal(buffer_input_stream.read(5).data.to_s,
+ peeked_data.to_s)
+  end
 end

[arrow] branch master updated: ARROW-4147: [Java] reduce heap usage for varwidth vectors (#3298)

2019-01-08 Thread siddteotia

This is an automated email from the ASF dual-hosted git repository.

siddteotia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new bfe6865  ARROW-4147: [Java] reduce heap usage for varwidth vectors 
(#3298)
bfe6865 is described below

commit bfe6865ba8087a46bd7665679e48af3a77987cef
Author: Pindikura Ravindra 
AuthorDate: Wed Jan 9 09:11:01 2019 +0530

ARROW-4147: [Java] reduce heap usage for varwidth vectors (#3298)

* ARROW-4147: reduce heap usage for varwidth vectors

- some code reorg to avoid duplication
- changed the default initial alloc from 4096 to 3970

* ARROW-4147: [Java] Address review comments

* ARROW-4147: remove check on width to be <= 16:

* ARROW-4147: allow initial valueCount to be 0.

* ARROW-4147: Fix incorrect comment on initial alloc
---
 .../apache/arrow/vector/BaseFixedWidthVector.java  | 127 ++---
 .../org/apache/arrow/vector/BaseValueVector.java   |  99 +++-
 .../arrow/vector/BaseVariableWidthVector.java  | 165 +++---
 .../java/org/apache/arrow/vector/BitVector.java|   5 +-
 .../arrow/vector/TestBufferOwnershipTransfer.java  |   9 +-
 .../java/org/apache/arrow/vector/TestCopyFrom.java | 569 +++--
 .../org/apache/arrow/vector/TestValueVector.java   | 435 +---
 .../org/apache/arrow/vector/TestVectorReAlloc.java |  23 +-
 .../vector/complex/writer/TestComplexWriter.java   |  15 +-
 9 files changed, 799 insertions(+), 648 deletions(-)

diff --git 
a/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java 
b/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java
index f69a9d1..f3c2837 100644
--- 
a/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java
+++ 
b/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java
@@ -22,7 +22,6 @@ import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
 
-import org.apache.arrow.memory.BaseAllocator;
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.OutOfMemoryException;
 import org.apache.arrow.vector.ipc.message.ArrowFieldNode;
@@ -43,8 +42,7 @@ public abstract class BaseFixedWidthVector extends 
BaseValueVector
 implements FixedWidthVector, FieldVector, VectorDefinitionSetter {
   private final int typeWidth;
 
-  protected int valueAllocationSizeInBytes;
-  protected int validityAllocationSizeInBytes;
+  protected int initialValueAllocation;
 
   protected final Field field;
   private int allocationMonitor;
@@ -61,14 +59,7 @@ public abstract class BaseFixedWidthVector extends 
BaseValueVector
 allocationMonitor = 0;
 validityBuffer = allocator.getEmpty();
 valueBuffer = allocator.getEmpty();
-if (typeWidth > 0) {
-  valueAllocationSizeInBytes = INITIAL_VALUE_ALLOCATION * typeWidth;
-  validityAllocationSizeInBytes = 
getValidityBufferSizeFromCount(INITIAL_VALUE_ALLOCATION);
-} else {
-  /* specialized handling for BitVector */
-  valueAllocationSizeInBytes = 
getValidityBufferSizeFromCount(INITIAL_VALUE_ALLOCATION);
-  validityAllocationSizeInBytes = valueAllocationSizeInBytes;
-}
+initialValueAllocation = INITIAL_VALUE_ALLOCATION;
   }
 
 
@@ -159,12 +150,8 @@ public abstract class BaseFixedWidthVector extends 
BaseValueVector
*/
   @Override
   public void setInitialCapacity(int valueCount) {
-final long size = (long) valueCount * typeWidth;
-if (size > MAX_ALLOCATION_SIZE) {
-  throw new OversizedAllocationException("Requested amount of memory is 
more than max allowed");
-}
-valueAllocationSizeInBytes = (int) size;
-validityAllocationSizeInBytes = getValidityBufferSizeFromCount(valueCount);
+computeAndCheckBufferSize(valueCount);
+initialValueAllocation = valueCount;
   }
 
   /**
@@ -267,18 +254,13 @@ public abstract class BaseFixedWidthVector extends 
BaseValueVector
*/
   @Override
   public boolean allocateNewSafe() {
-long curAllocationSizeValue = valueAllocationSizeInBytes;
-long curAllocationSizeValidity = validityAllocationSizeInBytes;
-
-if (align(curAllocationSizeValue) + curAllocationSizeValidity > 
MAX_ALLOCATION_SIZE) {
-  throw new OversizedAllocationException("Requested amount of memory 
exceeds limit");
-}
+computeAndCheckBufferSize(initialValueAllocation);
 
 /* we are doing a new allocation -- release the current buffers */
 clear();
 
 try {
-  allocateBytes(curAllocationSizeValue, curAllocationSizeValidity);
+  allocateBytes(initialValueAllocation);
 } catch (Exception e) {
   clear();
   return false;
@@ -295,22 +277,13 @@ public abstract class BaseFixedWidthVector extends 
BaseValueVector
* @throws org.apache.arrow.memory.OutOfMemoryException on error
*/
   public void allocateNew(int valueCount) {
-long

[arrow] branch master updated: ARROW-4175: [GLib] Add support for decimal compare operators

2019-01-08 Thread kou

This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 420c949  ARROW-4175: [GLib] Add support for decimal compare operators
420c949 is described below

commit 420c949fd4e593fb0303954092b3d8a46a7aa864
Author: Yosuke Shiro 
AuthorDate: Wed Jan 9 09:28:03 2019 +0900

ARROW-4175: [GLib] Add support for decimal compare operators

Author: Yosuke Shiro 
Author: Kouhei Sutou 

Closes #3346 from shiro615/glib-add-support-for-decimal-compare-operators 
and squashes the following commits:

28871fd6  Fix documents
e81d4146  Unify test case comparisons
0791c4f1  Use rubyish method name
54f46039  Add a test for equal
943c2364  Rename 'more than' to 'greater than'
181e0544   Add support for decimal compare operators
---
 c_glib/arrow-glib/decimal128.cpp | 98 +++-
 c_glib/arrow-glib/decimal128.h   | 15 ++
 c_glib/test/test-decimal128.rb   | 97 +++
 3 files changed, 209 insertions(+), 1 deletion(-)

diff --git a/c_glib/arrow-glib/decimal128.cpp b/c_glib/arrow-glib/decimal128.cpp
index d87a501..a49dba5 100644
--- a/c_glib/arrow-glib/decimal128.cpp
+++ b/c_glib/arrow-glib/decimal128.cpp
@@ -141,7 +141,8 @@ garrow_decimal128_new_integer(const gint64 data)
  * @decimal: A #GArrowDecimal128.
  * @other_decimal: A #GArrowDecimal128 to be compared.
  *
- * Returns: %TRUE if both of them is the same value, %FALSE otherwise.
+ * Returns: %TRUE if the decimal is equal to the other decimal, %FALSE
+ *   otherwise.
  *
  * Since: 0.12.0
  */
@@ -155,6 +156,101 @@ garrow_decimal128_equal(GArrowDecimal128 *decimal,
 }
 
 /**
+ * garrow_decimal128_not_equal:
+ * @decimal: A #GArrowDecimal128.
+ * @other_decimal: A #GArrowDecimal128 to be compared.
+ *
+ * Returns: %TRUE if the decimal isn't equal to the other decimal,
+ *   %FALSE otherwise.
+ *
+ * Since: 0.12.0
+ */
+gboolean
+garrow_decimal128_not_equal(GArrowDecimal128 *decimal,
+GArrowDecimal128 *other_decimal)
+{
+  const auto arrow_decimal = garrow_decimal128_get_raw(decimal);
+  const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal);
+  return *arrow_decimal != *arrow_other_decimal;
+}
+
+/**
+ * garrow_decimal128_less_than:
+ * @decimal: A #GArrowDecimal128.
+ * @other_decimal: A #GArrowDecimal128 to be compared.
+ *
+ * Returns: %TRUE if the decimal is less than the other decimal,
+ *   %FALSE otherwise.
+ *
+ * Since: 0.12.0
+ */
+gboolean
+garrow_decimal128_less_than(GArrowDecimal128 *decimal,
+GArrowDecimal128 *other_decimal)
+{
+  const auto arrow_decimal = garrow_decimal128_get_raw(decimal);
+  const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal);
+  return *arrow_decimal < *arrow_other_decimal;
+}
+
+/**
+ * garrow_decimal128_less_than_or_equal:
+ * @decimal: A #GArrowDecimal128.
+ * @other_decimal: A #GArrowDecimal128 to be compared.
+ *
+ * Returns: %TRUE if the decimal is less than the other decimal
+ *   or equal to the other decimal, %FALSE otherwise.
+ *
+ * Since: 0.12.0
+ */
+gboolean
+garrow_decimal128_less_than_or_equal(GArrowDecimal128 *decimal,
+ GArrowDecimal128 *other_decimal)
+{
+  const auto arrow_decimal = garrow_decimal128_get_raw(decimal);
+  const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal);
+  return *arrow_decimal <= *arrow_other_decimal;
+}
+
+/**
+ * garrow_decimal128_greater_than:
+ * @decimal: A #GArrowDecimal128.
+ * @other_decimal: A #GArrowDecimal128 to be compared.
+ *
+ * Returns: %TRUE if the decimal is greater than the other decimal,
+ *   %FALSE otherwise.
+ *
+ * Since: 0.12.0
+ */
+gboolean
+garrow_decimal128_greater_than(GArrowDecimal128 *decimal,
+   GArrowDecimal128 *other_decimal)
+{
+  const auto arrow_decimal = garrow_decimal128_get_raw(decimal);
+  const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal);
+  return *arrow_decimal > *arrow_other_decimal;
+}
+
+/**
+ * garrow_decimal128_greater_than_or_equal:
+ * @decimal: A #GArrowDecimal128.
+ * @other_decimal: A #GArrowDecimal128 to be compared.
+ *
+ * Returns: %TRUE if the decimal is greater than the other decimal
+ *   or equal to the other decimal, %FALSE otherwise.
+ *
+ * Since: 0.12.0
+ */
+gboolean
+garrow_decimal128_greater_than_or_equal(GArrowDecimal128 *decimal,
+GArrowDecimal128 *other_decimal)
+{
+  const auto arrow_decimal = garrow_decimal128_get_raw(decimal);
+  const auto arrow_other_decimal = garrow_decimal128_get_raw(other_decimal);
+  return *arrow_decimal >= *arrow_other_decimal;
+}
+
+/**
  * garrow_decimal128_to_string_scale:
  * @decimal: A #GArrowDecimal128.
  * @scale: The scale of the decimal.
diff

[arrow] branch master updated: ARROW-4184: [Ruby] Add Arrow::RecordBatch#to_table

2019-01-08 Thread shiro

This is an automated email from the ASF dual-hosted git repository.

shiro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new a3aed3b  ARROW-4184: [Ruby] Add Arrow::RecordBatch#to_table
a3aed3b is described below

commit a3aed3b60bd61c55d7402c4484e480f1998b99f1
Author: Kouhei Sutou 
AuthorDate: Wed Jan 9 09:17:46 2019 +0900

ARROW-4184: [Ruby] Add Arrow::RecordBatch#to_table

Author: Kouhei Sutou 

Closes #3339 from kou/ruby-record-batch-to-table and squashes the following 
commits:

a6fab35f  Require gobject-introspection gem 3.3.1 or later
4a1f3564   Add Arrow::RecordBatch#to_table
---
 ruby/red-arrow/lib/arrow/record-batch.rb |  9 +
 ruby/red-arrow/red-arrow.gemspec |  2 +-
 ruby/red-arrow/test/test-record-batch.rb | 23 ++-
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/ruby/red-arrow/lib/arrow/record-batch.rb 
b/ruby/red-arrow/lib/arrow/record-batch.rb
index f5f8ea2..6d9c35b 100644
--- a/ruby/red-arrow/lib/arrow/record-batch.rb
+++ b/ruby/red-arrow/lib/arrow/record-batch.rb
@@ -29,6 +29,15 @@ module Arrow
   @columns ||= columns_raw
 end
 
+# Converts the record batch to {Arrow::Table}.
+#
+# @return [Arrow::Table]
+#
+# @since 0.12.0
+def to_table
+  Table.new(schema, [self])
+end
+
 def respond_to_missing?(name, include_private)
   return true if find_column(name)
   super
diff --git a/ruby/red-arrow/red-arrow.gemspec b/ruby/red-arrow/red-arrow.gemspec
index 8e79c75..2d417f0 100644
--- a/ruby/red-arrow/red-arrow.gemspec
+++ b/ruby/red-arrow/red-arrow.gemspec
@@ -45,7 +45,7 @@ Gem::Specification.new do |spec|
   spec.test_files += Dir.glob("test/**/*")
   spec.extensions = ["dependency-check/Rakefile"]
 
-  spec.add_runtime_dependency("gobject-introspection", ">= 3.1.1")
+  spec.add_runtime_dependency("gobject-introspection", ">= 3.3.1")
   spec.add_runtime_dependency("pkg-config")
   spec.add_runtime_dependency("native-package-installer")
 
diff --git a/ruby/red-arrow/test/test-record-batch.rb 
b/ruby/red-arrow/test/test-record-batch.rb
index 994b16d..4dac085 100644
--- a/ruby/red-arrow/test/test-record-batch.rb
+++ b/ruby/red-arrow/test/test-record-batch.rb
@@ -16,16 +16,16 @@
 # under the License.
 
 class RecordBatchTest < Test::Unit::TestCase
-  sub_test_case(".each") do
-setup do
-  fields = [
-Arrow::Field.new("count", :uint32),
-  ]
-  @schema = Arrow::Schema.new(fields)
-  @counts = Arrow::UInt32Array.new([1, 2, 4, 8])
-  @record_batch = Arrow::RecordBatch.new(@schema, @counts.length, 
[@counts])
-end
+  setup do
+fields = [
+  Arrow::Field.new("count", :uint32),
+]
+@schema = Arrow::Schema.new(fields)
+@counts = Arrow::UInt32Array.new([1, 2, 4, 8])
+@record_batch = Arrow::RecordBatch.new(@schema, @counts.length, [@counts])
+  end
 
+  sub_test_case(".each") do
 test("default") do
   records = []
   @record_batch.each do |record|
@@ -54,4 +54,9 @@ class RecordBatchTest < Test::Unit::TestCase
records.collect {|record, i| [record.index, i]})
 end
   end
+
+  test("#to_table") do
+assert_equal(Arrow::Table.new(@schema, [@counts]),
+ @record_batch.to_table)
+  end
 end

[arrow] branch master updated: ARROW-4172: [Rust] more consistent naming in array builders

2019-01-08 Thread agrove

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new bcca04a  ARROW-4172: [Rust] more consistent naming in array builders
bcca04a is described below

commit bcca04aabd804263c555945463f5cf4a2ab6216f
Author: Chao Sun 
AuthorDate: Tue Jan 8 16:56:31 2019 -0700

ARROW-4172: [Rust] more consistent naming in array builders

This is to make the namings in `builder.rs` more consistent:

1. Changes `PrimitiveArrayBuilder` to `PrimitiveBuilder`, similarly for
   `ListArrayBuilder`, `BinaryArrayBuilder` and `StructArrayBuilder`.
The `Array` seems redundant.
2. Currently we use both `push` and `append`, which is a bit confusing.
   This unifies them by using `append`.

Author: Chao Sun 

Closes #3345 from sunchao/ARROW-4172 and squashes the following commits:

3472d12  ARROW-4172:  more consistent naming in array builders
---
 rust/arrow/examples/builders.rs |  12 +-
 rust/arrow/src/array.rs |   4 +-
 rust/arrow/src/array_ops.rs |  22 +--
 rust/arrow/src/builder.rs   | 368 
 rust/arrow/src/csv/reader.rs|  10 +-
 rust/arrow/src/tensor.rs|  12 +-
 6 files changed, 214 insertions(+), 214 deletions(-)

diff --git a/rust/arrow/examples/builders.rs b/rust/arrow/examples/builders.rs
index 92f45ce..f9ba297 100644
--- a/rust/arrow/examples/builders.rs
+++ b/rust/arrow/examples/builders.rs
@@ -29,14 +29,14 @@ fn main() {
 // Create a new builder with a capacity of 100
 let mut primitive_array_builder = Int32Builder::new(100);
 
-// Push an individual primitive value
-primitive_array_builder.push(55).unwrap();
+// Append an individual primitive value
+primitive_array_builder.append_value(55).unwrap();
 
-// Push a null value
-primitive_array_builder.push_null().unwrap();
+// Append a null value
+primitive_array_builder.append_null().unwrap();
 
-// Push a slice of primitive values
-primitive_array_builder.push_slice(&[39, 89, 12]).unwrap();
+// Append a slice of primitive values
+primitive_array_builder.append_slice(&[39, 89, 12]).unwrap();
 
 // Build the `PrimitiveArray`
 let _primitive_array = primitive_array_builder.finish();
diff --git a/rust/arrow/src/array.rs b/rust/arrow/src/array.rs
index f8272eb..78910d5 100644
--- a/rust/arrow/src/array.rs
+++ b/rust/arrow/src/array.rs
@@ -201,8 +201,8 @@ impl PrimitiveArray {
 }
 
 // Returns a new primitive array builder
-pub fn builder(capacity: usize) -> PrimitiveArrayBuilder {
-PrimitiveArrayBuildernew(capacity)
+pub fn builder(capacity: usize) -> PrimitiveBuilder {
+PrimitiveBuildernew(capacity)
 }
 }
 
diff --git a/rust/arrow/src/array_ops.rs b/rust/arrow/src/array_ops.rs
index 6963709..f41740a 100644
--- a/rust/arrow/src/array_ops.rs
+++ b/rust/arrow/src/array_ops.rs
@@ -22,7 +22,7 @@ use std::ops::{Add, Div, Mul, Sub};
 use num::Zero;
 
 use crate::array::{Array, BooleanArray, PrimitiveArray};
-use crate::builder::PrimitiveArrayBuilder;
+use crate::builder::PrimitiveBuilder;
 use crate::datatypes;
 use crate::datatypes::ArrowNumericType;
 use crate::error::{ArrowError, Result};
@@ -102,13 +102,13 @@ where
 "Cannot perform math operation on arrays of different 
length".to_string(),
 ));
 }
-let mut b = PrimitiveArrayBuildernew(left.len());
+let mut b = PrimitiveBuildernew(left.len());
 for i in 0..left.len() {
 let index = i;
 if left.is_null(i) || right.is_null(i) {
-b.push_null()?;
+b.append_null()?;
 } else {
-b.push(op(left.value(index), right.value(index))?)?;
+b.append_value(op(left.value(index), right.value(index))?)?;
 }
 }
 Ok(b.finish())
@@ -276,7 +276,7 @@ where
 } else {
 Some(right.value(index))
 };
-b.push(op(l, r))?;
+b.append_value(op(l, r))?;
 }
 Ok(b.finish())
 }
@@ -291,9 +291,9 @@ pub fn and(left: , right: ) -> 
Result {
 let mut b = BooleanArray::builder(left.len());
 for i in 0..left.len() {
 if left.is_null(i) || right.is_null(i) {
-b.push_null()?;
+b.append_null()?;
 } else {
-b.push(left.value(i) && right.value(i))?;
+b.append_value(left.value(i) && right.value(i))?;
 }
 }
 Ok(b.finish())
@@ -309,9 +309,9 @@ pub fn or(left: , right: ) -> 
Result {
 let mut b = BooleanArray::builder(left.len());
 for i in 0..left.len() {
 if left.is_null(i) || right.is_null(i) {
-b.push_null()?;
+b.append_null()?;
 } else {
-b.push(left.value(i) || right.value(i))?;
+b.append_value(left.value(i) ||

[arrow] branch master updated: ARROW-3839: [Rust] Add ability to infer schema in CSV reader

2019-01-08 Thread agrove

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new ac45f32  ARROW-3839: [Rust] Add ability to infer schema in CSV reader
ac45f32 is described below

commit ac45f3210a194049ef35f49847dbc4ff5e70d48f
Author: Neville Dipale 
AuthorDate: Tue Jan 8 16:49:12 2019 -0700

ARROW-3839: [Rust] Add ability to infer schema in CSV reader

Resubmission of #3128

Author: Neville Dipale 

Closes #3349 from nevi-me/rust/infer-csv-schema and squashes the following 
commits:

0838199  ARROW-3839:  Add ability to infer schema in CSV 
reader
---
 ci/rust-build-main.bat  |   1 +
 ci/travis_script_rust.sh|   1 +
 rust/arrow/Cargo.toml   |   2 +
 rust/arrow/examples/read_csv_infer_schema.rs|  66 +
 rust/arrow/src/csv/mod.rs   |   1 +
 rust/arrow/src/csv/reader.rs| 373 +++-
 rust/arrow/src/datatypes.rs |   4 +-
 rust/arrow/src/error.rs |  37 +++
 rust/arrow/test/data/uk_cities_with_headers.csv |  38 +++
 rust/arrow/test/data/various_types.csv  |   6 +
 10 files changed, 524 insertions(+), 5 deletions(-)

diff --git a/ci/rust-build-main.bat b/ci/rust-build-main.bat
index ac5c9e7..b36a97a 100644
--- a/ci/rust-build-main.bat
+++ b/ci/rust-build-main.bat
@@ -40,5 +40,6 @@ cd arrow
 cargo run --example builders --target %TARGET% --release || exit /B
 cargo run --example dynamic_types --target %TARGET% --release || exit /B
 cargo run --example read_csv --target %TARGET% --release || exit /B
+cargo run --example read_csv_infer_schema --target %TARGET% --release || exit 
/B
 
 popd
diff --git a/ci/travis_script_rust.sh b/ci/travis_script_rust.sh
index 8e3c8c3..c25d64e 100755
--- a/ci/travis_script_rust.sh
+++ b/ci/travis_script_rust.sh
@@ -39,5 +39,6 @@ cd arrow
 cargo run --example builders
 cargo run --example dynamic_types
 cargo run --example read_csv
+cargo run --example read_csv_infer_schema
 
 popd
diff --git a/rust/arrow/Cargo.toml b/rust/arrow/Cargo.toml
index 77e8d53..38e7e5e 100644
--- a/rust/arrow/Cargo.toml
+++ b/rust/arrow/Cargo.toml
@@ -43,6 +43,8 @@ serde_json = "1.0.13"
 rand = "0.5"
 csv = "1.0.0"
 num = "0.2"
+regex = "1.1"
+lazy_static = "1.2"
 
 [dev-dependencies]
 criterion = "0.2"
diff --git a/rust/arrow/examples/read_csv_infer_schema.rs 
b/rust/arrow/examples/read_csv_infer_schema.rs
new file mode 100644
index 000..9dd2d2a
--- /dev/null
+++ b/rust/arrow/examples/read_csv_infer_schema.rs
@@ -0,0 +1,66 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+extern crate arrow;
+
+use arrow::array::{BinaryArray, Float64Array};
+use arrow::csv;
+use std::fs::File;
+
+fn main() {
+let file = File::open("test/data/uk_cities_with_headers.csv").unwrap();
+let builder = csv::ReaderBuilder::new()
+.has_headers(true)
+.infer_schema(Some(100));
+let mut csv = builder.build(file).unwrap();
+let batch = csv.next().unwrap().unwrap();
+
+println!(
+"Loaded {} rows containing {} columns",
+batch.num_rows(),
+batch.num_columns()
+);
+
+println!("Inferred schema: {:?}", batch.schema());
+
+let city = batch
+.column(0)
+.as_any()
+.downcast_ref::()
+.unwrap();
+let lat = batch
+.column(1)
+.as_any()
+.downcast_ref::()
+.unwrap();
+let lng = batch
+.column(2)
+.as_any()
+.downcast_ref::()
+.unwrap();
+
+for i in 0..batch.num_rows() {
+let city_name: String = 
String::from_utf8(city.value(i).to_vec()).unwrap();
+
+println!(
+"City: {}, Latitude: {}, Longitude: {}",
+city_name,
+lat.value(i),
+lng.value(i)
+);
+}
+}
diff --git a/rust/arrow/src/csv/mod.rs b/rust/arrow/src/csv/mod.rs
index 9f2bd1d..6521b19 100644
--- a/rust/arrow/src/csv/mod.rs
+++ b/rust/arrow/src/csv/mod.rs
@@ -18,3 +18,4 @@
 pub mod reader;

[arrow] branch master updated: ARROW-4186: [C++] BitmapWriter shouldn't clobber data when length == 0

2019-01-08 Thread wesm

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 326015c  ARROW-4186: [C++] BitmapWriter shouldn't clobber data when 
length == 0
326015c is described below

commit 326015cfc66e1f657cdd6811620137e9e277b43d
Author: Antoine Pitrou 
AuthorDate: Tue Jan 8 10:17:54 2019 -0600

ARROW-4186: [C++] BitmapWriter shouldn't clobber data when length == 0

Author: Antoine Pitrou 

Closes #3348 from pitrou/ARROW-4186-bitmap-writer-zero-length and squashes 
the following commits:

2299b0906  ARROW-4186:  BitmapWriter shouldn't clobber data 
when length == 0
---
 cpp/src/arrow/util/bit-util-test.cc | 79 ++---
 cpp/src/arrow/util/bit-util.h   |  4 +-
 2 files changed, 50 insertions(+), 33 deletions(-)

diff --git a/cpp/src/arrow/util/bit-util-test.cc 
b/cpp/src/arrow/util/bit-util-test.cc
index b12e2ec..174e6d0 100644
--- a/cpp/src/arrow/util/bit-util-test.cc
+++ b/cpp/src/arrow/util/bit-util-test.cc
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
@@ -167,33 +166,40 @@ TEST(BitmapReader, DoesNotReadOutOfBounds) {
 }
 
 TEST(BitmapWriter, NormalOperation) {
-  {
-uint8_t bitmap[] = {0, 0, 0, 0};
-auto writer = internal::BitmapWriter(bitmap, 0, 12);
-WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
-//  {0b00110110, 0b1010, 0, 0}
-ASSERT_BYTES_EQ(bitmap, {0x36, 0x0a, 0, 0});
-  }
-  {
-uint8_t bitmap[] = {0xff, 0xff, 0xff, 0xff};
-auto writer = internal::BitmapWriter(bitmap, 0, 12);
-WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
-//  {0b00110110, 0b1010, 0xff, 0xff}
-ASSERT_BYTES_EQ(bitmap, {0x36, 0xfa, 0xff, 0xff});
-  }
-  {
-uint8_t bitmap[] = {0, 0, 0, 0};
-auto writer = internal::BitmapWriter(bitmap, 3, 12);
-WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
-//  {0b1011, 0b01010001, 0, 0}
-ASSERT_BYTES_EQ(bitmap, {0xb0, 0x51, 0, 0});
-  }
-  {
-uint8_t bitmap[] = {0, 0, 0, 0};
-auto writer = internal::BitmapWriter(bitmap, 20, 12);
-WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
-//  {0, 0, 0b0110, 0b10100011}
-ASSERT_BYTES_EQ(bitmap, {0, 0, 0x60, 0xa3});
+  for (const auto fill_byte_int : {0x00, 0xff}) {
+const uint8_t fill_byte = static_cast(fill_byte_int);
+{
+  uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte};
+  auto writer = internal::BitmapWriter(bitmap, 0, 12);
+  WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
+  //  {0b00110110, 0b1010, , }
+  ASSERT_BYTES_EQ(bitmap, {0x36, static_cast(0x0a | (fill_byte & 
0xf0)),
+   fill_byte, fill_byte});
+}
+{
+  uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte};
+  auto writer = internal::BitmapWriter(bitmap, 3, 12);
+  WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
+  //  {0b10110..., 0b.1010001, , }
+  ASSERT_BYTES_EQ(bitmap, {static_cast(0xb0 | (fill_byte & 0x07)),
+   static_cast(0x51 | (fill_byte & 
0x80)), fill_byte,
+   fill_byte});
+}
+{
+  uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte};
+  auto writer = internal::BitmapWriter(bitmap, 20, 12);
+  WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
+  //  {, , 0b0110, 0b10100011}
+  ASSERT_BYTES_EQ(bitmap, {fill_byte, fill_byte,
+   static_cast(0x60 | (fill_byte & 
0x0f)), 0xa3});
+}
+// 0-length writes
+for (int64_t pos = 0; pos < 32; ++pos) {
+  uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte};
+  auto writer = internal::BitmapWriter(bitmap, pos, 0);
+  WriteVectorToWriter(writer, {});
+  ASSERT_BYTES_EQ(bitmap, {fill_byte, fill_byte, fill_byte, fill_byte});
+}
   }
 }
 
@@ -267,6 +273,10 @@ TEST(FirstTimeBitmapWriter, NormalOperation) {
 {
   uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte};
   {
+auto writer = internal::FirstTimeBitmapWriter(bitmap, 4, 0);
+WriteVectorToWriter(writer, {});
+  }
+  {
 auto writer = internal::FirstTimeBitmapWriter(bitmap, 4, 6);
 WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1});
   }
@@ -275,6 +285,10 @@ TEST(FirstTimeBitmapWriter, NormalOperation) {
 WriteVectorToWriter(writer, {0, 0, 0});
   }
   {
+auto writer = internal::FirstTimeBitmapWriter(bitmap, 13, 0);
+WriteVectorToWriter(writer,

[arrow] branch master updated: ARROW-4191: [C++] Use same CC and AR for jemalloc as for the main sources

2019-01-08 Thread wesm

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new ccec638  ARROW-4191: [C++] Use same CC and AR for jemalloc as for the 
main sources
ccec638 is described below

commit ccec63847e7709317a18036931ef3e3fbeab1f05
Author: Korn, Uwe 
AuthorDate: Tue Jan 8 10:14:53 2019 -0600

ARROW-4191: [C++] Use same CC and AR for jemalloc as for the main sources

Author: Korn, Uwe 

Closes #3347 from xhochy/ARROW-4191 and squashes the following commits:

44df02a23  ARROW-4191:  Use same CC and AR for jemalloc as for 
the main sources
---
 cpp/cmake_modules/ThirdpartyToolchain.cmake | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index d8b3486..5a8c28f 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -772,7 +772,7 @@ if (ARROW_JEMALLOC)
   ExternalProject_Add(jemalloc_ep
 URL 
${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/jemalloc/${JEMALLOC_VERSION}.tar.gz
 PATCH_COMMAND touch doc/jemalloc.3 doc/jemalloc.html
-CONFIGURE_COMMAND ./autogen.sh "--prefix=${JEMALLOC_PREFIX}" 
"--with-jemalloc-prefix=je_arrow_" "--with-private-namespace=je_arrow_private_" 
"--disable-tls"
+CONFIGURE_COMMAND ./autogen.sh "AR=${CMAKE_AR}" "CC=${CMAKE_C_COMPILER}" 
"--prefix=${JEMALLOC_PREFIX}" "--with-jemalloc-prefix=je_arrow_" 
"--with-private-namespace=je_arrow_private_" "--disable-tls"
 ${EP_LOG_OPTIONS}
 BUILD_IN_SOURCE 1
 BUILD_COMMAND ${MAKE} ${MAKE_BUILD_ARGS}

[arrow] branch master updated: ARROW-4060: [Rust] Add parquet arrow converter.

2019-01-08 Thread agrove

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new af07f75  ARROW-4060: [Rust] Add parquet arrow converter.
af07f75 is described below

commit af07f75c1f692d1ed4cea93d358ff1acda6a1771
Author: Renjie Liu 
AuthorDate: Tue Jan 8 06:45:13 2019 -0700

ARROW-4060: [Rust] Add parquet arrow converter.

This is the first step of adding an arrow reader and writer for parquet-rs.
This commit contains a converter which converts parquet schema to arrow 
schema.
Copied from this pr https://github.com/sunchao/parquet-rs/pull/185.

Author: Renjie Liu 

Closes #3279 from liurenjie1024/rust-arrow-schema-converter and squashes 
the following commits:

1bfa00f  Resolve conflict
8806b16  Add parquet arrow converter
---
 rust/parquet/src/errors.rs |   6 +
 rust/parquet/src/lib.rs|   1 +
 rust/parquet/src/{lib.rs => reader/mod.rs} |  28 +-
 rust/parquet/src/reader/schema.rs  | 779 +
 rust/parquet/src/schema/types.rs   |  14 +-
 5 files changed, 805 insertions(+), 23 deletions(-)

diff --git a/rust/parquet/src/errors.rs b/rust/parquet/src/errors.rs
index a5532c1..abfbda9 100644
--- a/rust/parquet/src/errors.rs
+++ b/rust/parquet/src/errors.rs
@@ -50,6 +50,12 @@ quick_error! {
   display("EOF: {}", message)
   description(message)
   }
+  /// Arrow error.
+  /// Returned when reading into arrow or writing from arrow.
+  ArrowError(message:  String) {
+  display("Arrow: {}", message)
+  description(message)
+  }
   }
 }
 
diff --git a/rust/parquet/src/lib.rs b/rust/parquet/src/lib.rs
index 75c56f5..cad85ec 100644
--- a/rust/parquet/src/lib.rs
+++ b/rust/parquet/src/lib.rs
@@ -37,5 +37,6 @@ pub mod column;
 pub mod compression;
 mod encodings;
 pub mod file;
+pub mod reader;
 pub mod record;
 pub mod schema;
diff --git a/rust/parquet/src/lib.rs b/rust/parquet/src/reader/mod.rs
similarity index 64%
copy from rust/parquet/src/lib.rs
copy to rust/parquet/src/reader/mod.rs
index 75c56f5..fe580c5 100644
--- a/rust/parquet/src/lib.rs
+++ b/rust/parquet/src/reader/mod.rs
@@ -15,27 +15,11 @@
 // specific language governing permissions and limitations
 // under the License.
 
-#![feature(type_ascription)]
-#![feature(rustc_private)]
-#![feature(specialization)]
-#![feature(try_from)]
-#![allow(dead_code)]
-#![allow(non_camel_case_types)]
+//! [Apache Arrow](http://arrow.apache.org/) is a cross-language development 
platform for
+//! in-memory data.
+//!
+//! This mod provides API for converting between arrow and parquet.
 
-#[macro_use]
-pub mod errors;
-pub mod basic;
-pub mod data_type;
-
-// Exported for external use, such as benchmarks
-pub use self::encodings::{decoding, encoding};
-pub use self::util::memory;
-
-#[macro_use]
-mod util;
-pub mod column;
-pub mod compression;
-mod encodings;
-pub mod file;
-pub mod record;
 pub mod schema;
+
+pub use self::schema::{parquet_to_arrow_schema, 
parquet_to_arrow_schema_by_columns};
diff --git a/rust/parquet/src/reader/schema.rs 
b/rust/parquet/src/reader/schema.rs
new file mode 100644
index 000..68fd867
--- /dev/null
+++ b/rust/parquet/src/reader/schema.rs
@@ -0,0 +1,779 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Provides API for converting parquet schema to arrow schema and vice versa.
+//!
+//! The main interfaces for converting parquet schema to arrow schema  are
+//! `parquet_to_arrow_schema` and `parquet_to_arrow_schema_by_columns`.
+//!
+//! The interfaces for converting arrow schema to parquet schema is coming.
+
+use std::{collections::HashSet, rc::Rc};
+
+use crate::basic::{LogicalType, Repetition, Type as PhysicalType};
+use crate::errors::{ParquetError::ArrowError, Result};
+use crate::schema::types::{SchemaDescPtr, Type, TypePtr};
+
+use arrow::datatypes::{DataType, Field, Schema};
+
+/// Convert parquet schema to arrow schema.
+pub fn parquet_to_arrow_schema(parquet_schema: SchemaDescPtr) -> 
Result {
+

[arrow] branch master updated: ARROW-4183: [Ruby] Add Arrow::Struct as an element of Arrow::StructArray

2019-01-08 Thread shiro

This is an automated email from the ASF dual-hosted git repository.

shiro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 8704f8b  ARROW-4183: [Ruby] Add Arrow::Struct as an element of 
Arrow::StructArray
8704f8b is described below

commit 8704f8bd98f1edcf1f9ecc51d6fb3b4b5b4ecb88
Author: Kouhei Sutou 
AuthorDate: Tue Jan 8 22:32:13 2019 +0900

ARROW-4183: [Ruby] Add Arrow::Struct as an element of Arrow::StructArray

Returning Arrow::Array by Arrow::StructArray#[] is deprecated.  It'll
return Arrow::Struct in the next release. It's for consistency. All
Arrow::Array#[] implementations should return an element.

Author: Kouhei Sutou 

Closes #3338 from kou/ruby-struct and squashes the following commits:

a0561954   Add Arrow::Struct as an element of 
Arrow::StructArray
---
 ruby/red-arrow/lib/arrow/struct-array-builder.rb |  9 ++-
 ruby/red-arrow/lib/arrow/struct-array.rb | 34 ++
 ruby/red-arrow/lib/arrow/struct.rb   | 68 
 ruby/red-arrow/test/test-struct-array-builder.rb | 47 +-
 ruby/red-arrow/test/test-struct-array.rb | 58 -
 ruby/red-arrow/test/test-struct.rb   | 81 
 6 files changed, 263 insertions(+), 34 deletions(-)

diff --git a/ruby/red-arrow/lib/arrow/struct-array-builder.rb 
b/ruby/red-arrow/lib/arrow/struct-array-builder.rb
index 883ce84..52f75aa 100644
--- a/ruby/red-arrow/lib/arrow/struct-array-builder.rb
+++ b/ruby/red-arrow/lib/arrow/struct-array-builder.rb
@@ -73,13 +73,20 @@ module Arrow
   value.each_with_index do |sub_value, i|
 self[i].append_value(sub_value)
   end
+when Arrow::Struct
+  append_value_raw
+  value.values.each_with_index do |sub_value, i|
+self[i].append_value(sub_value)
+  end
 when Hash
   append_value_raw
   value.each do |name, sub_value|
 self[name].append_value(sub_value)
   end
 else
-  message = "struct value must be nil, Array or Hash: #{value.inspect}"
+  message =
+"struct value must be nil, Array, " +
+"Arrow::Struct or Hash: #{value.inspect}"
   raise ArgumentError, message
 end
   else
diff --git a/ruby/red-arrow/lib/arrow/struct-array.rb 
b/ruby/red-arrow/lib/arrow/struct-array.rb
index 4f9834c..e55a507 100644
--- a/ruby/red-arrow/lib/arrow/struct-array.rb
+++ b/ruby/red-arrow/lib/arrow/struct-array.rb
@@ -15,10 +15,44 @@
 # specific language governing permissions and limitations
 # under the License.
 
+require "arrow/struct"
+
 module Arrow
   class StructArray
 def [](i)
+  warn("Use #{self.class}\#find_field instead. " +
+   "This will returns Arrow::Struct instead of Arrow::Array " +
+   "since 0.13.0.")
   get_field(i)
 end
+
+def get_value(i)
+  Struct.new(self, i)
+end
+
+def find_field(index_or_name)
+  case index_or_name
+  when String, Symbol
+name = index_or_name
+(@name_to_field ||= build_name_to_field)[name.to_s]
+  else
+index = index_or_name
+cached_fields[index]
+  end
+end
+
+private
+def cached_fields
+  @fields ||= fields
+end
+
+def build_name_to_field
+  name_to_field = {}
+  field_arrays = cached_fields
+  value_data_type.fields.each_with_index do |field, i|
+name_to_field[field.name] = field_arrays[i]
+  end
+  name_to_field
+end
   end
 end
diff --git a/ruby/red-arrow/lib/arrow/struct.rb 
b/ruby/red-arrow/lib/arrow/struct.rb
new file mode 100644
index 000..4ae12b8
--- /dev/null
+++ b/ruby/red-arrow/lib/arrow/struct.rb
@@ -0,0 +1,68 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+module Arrow
+  class Struct
+attr_accessor :index
+def initialize(array, index)
+  @array = array
+  @index = index
+end
+
+def [](field_name_or_field_index)
+  field = @array.find_field(field_name_or_field_index)
+  return nil if field.nil?
+  field[@index]

[arrow] branch master updated: ARROW-4104: [Java] fix a race condition in AllocationManager (#3246)

2019-01-08 Thread siddteotia

This is an automated email from the ASF dual-hosted git repository.

siddteotia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 55848a3  ARROW-4104: [Java] fix a race condition in AllocationManager 
(#3246)
55848a3 is described below

commit 55848a36edb5ea5e0765068ef5f09d07d09d4898
Author: Pindikura Ravindra 
AuthorDate: Tue Jan 8 16:13:18 2019 +0530

ARROW-4104: [Java] fix a race condition in AllocationManager (#3246)
---
 .../src/main/java/org/apache/arrow/memory/AllocationManager.java| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java 
b/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java
index 687674f..c10d246 100644
--- a/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java
+++ b/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java
@@ -230,7 +230,7 @@ public class AllocationManager {
   // since two balance transfers out from the allocator manager could 
cause incorrect
   // accounting, we need to ensure
   // that this won't happen by synchronizing on the allocator manager 
instance.
-  synchronized (this) {
+  synchronized (AllocationManager.this) {
 if (owningLedger != this) {
   return true;
 }
@@ -310,7 +310,7 @@ public class AllocationManager {
   allocator.assertOpen();
 
   final int outcome;
-  synchronized (this) {
+  synchronized (AllocationManager.this) {
 outcome = bufRefCnt.addAndGet(-decrement);
 if (outcome == 0) {
   lDestructionTime = System.nanoTime();
@@ -411,7 +411,7 @@ public class AllocationManager {
  * @return Amount of accounted(owned) memory associated with this ledger.
  */
 public int getAccountedSize() {
-  synchronized (this) {
+  synchronized (AllocationManager.this) {
 if (owningLedger == this) {
   return size;
 } else {

[arrow] branch master updated: ARROW-4188: [Rust] Move Rust README to top level rust directory

2019-01-08 Thread kszucs

This is an automated email from the ASF dual-hosted git repository.

kszucs pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 2057859  ARROW-4188: [Rust] Move Rust README to top level rust 
directory
2057859 is described below

commit 2057859744cb2ada93fc97838e09eb954963dc00
Author: Andy Grove 
AuthorDate: Tue Jan 8 11:03:17 2019 +0100

ARROW-4188: [Rust] Move Rust README to top level rust directory

Author: Andy Grove 

Closes #3342 from andygrove/ARROW-4188 and squashes the following commits:

fedcd7bc  split README between top level and arrow level
b68f77cb  Merge branch 'master' into ARROW-4188
e6dbd87f  add badges back
f2ee7e05  Move Rust README to top level rust directory
---
 rust/README.md   | 50 ++
 rust/arrow/README.md | 22 --
 2 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/rust/README.md b/rust/README.md
new file mode 100644
index 000..8fe7885
--- /dev/null
+++ b/rust/README.md
@@ -0,0 +1,50 @@
+
+
+# Native Rust implementation of Apache Arrow
+
+## The Rust implementation of Arrow consists of the following crates
+
+- Arrow [(README)](arrow/README.md)
+- Parquet [(README)](parquet/README.md)
+
+## Run Tests
+
+Parquet support in Arrow requires data to test against, this data is in a
+git submodule.  To pull down this data run the following:
+
+```bash
+git submodule update --init
+```
+
+The data can then be found in `cpp/submodules/parquet_testing/data`.
+Create a new environment variable called `PARQUET_TEST_DATA` to point
+to this location and then `cargo test` as usual.
+
+## Code Formatting
+
+Our CI uses `rustfmt` to check code formatting.  Although the project is
+built and tested against nightly rust we use the stable version of
+`rustfmt`.  So before submitting a PR be sure to run the following
+and check for lint issues:
+
+```bash
+cargo +stable fmt --all -- --check
+```
+
diff --git a/rust/arrow/README.md b/rust/arrow/README.md
index cbfd4dd..9df2dd2 100644
--- a/rust/arrow/README.md
+++ b/rust/arrow/README.md
@@ -57,28 +57,6 @@ cargo run --example dynamic_types
 cargo run --example read_csv
 ```
 
-## Run Tests
-
-Parquet support in Arrow requires data to test against, this data is in a
-git submodule.  To pull down this data run the following:
-
-```bash
-git submodule update --init
-```
-
-The data can then be found in `cpp/submodules/parquet_testing/data`.
-Create a new environment variable called `PARQUET_TEST_DATA` to point
-to this location and then `cargo test` as usual.
-
-Our CI uses `rustfmt` to check code formatting.  Although the project is
-built and tested against nightly rust we use the stable version of
-`rustfmt`.  So before submitting a PR be sure to run the following
-and check for lint issues:
-
-```bash
-cargo +stable fmt --all -- --check
-```
-
 # Publishing to crates.io
 
 An Arrow committer can publish this crate after an official project release has

[arrow] branch master updated: ARROW-4200: [C++/Python] Enable conda_env_python.yml to work on Windows, simplify python/development.rst

[arrow] branch master updated: ARROW-4199: [GLib] Add garrow_seekable_input_stream_peek()

[arrow] branch master updated: ARROW-4147: [Java] reduce heap usage for varwidth vectors (#3298)

[arrow] branch master updated: ARROW-4175: [GLib] Add support for decimal compare operators

[arrow] branch master updated: ARROW-4184: [Ruby] Add Arrow::RecordBatch#to_table

[arrow] branch master updated: ARROW-4172: [Rust] more consistent naming in array builders

[arrow] branch master updated: ARROW-3839: [Rust] Add ability to infer schema in CSV reader

[arrow] branch master updated: ARROW-4186: [C++] BitmapWriter shouldn't clobber data when length == 0

[arrow] branch master updated: ARROW-4191: [C++] Use same CC and AR for jemalloc as for the main sources

[arrow] branch master updated: ARROW-4060: [Rust] Add parquet arrow converter.

[arrow] branch master updated: ARROW-4183: [Ruby] Add Arrow::Struct as an element of Arrow::StructArray

[arrow] branch master updated: ARROW-4104: [Java] fix a race condition in AllocationManager (#3246)

[arrow] branch master updated: ARROW-4188: [Rust] Move Rust README to top level rust directory

13 matches

Site Navigation

Mail list logo

Footer information