[arrow] branch master updated (3cc12ab -> 4e51f98)

2019-08-15 Thread shiro
This is an automated email from the ASF dual-hosted git repository.

shiro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 3cc12ab  ARROW-6172 [Java] Provide benchmarks to set IntVector with 
different methods
 add 4e51f98  ARROW-6240: [Ruby] Arrow::Decimal128Array#get_value returns 
BigDecimal

No new revisions were added by this update.

Summary of changes:
 .../lib/arrow/{tensor.rb => decimal128-array.rb}   |  6 +++---
 ruby/red-arrow/lib/arrow/loader.rb |  6 +-
 .../test/test-decimal128-array-builder.rb  | 22 +++---
 ruby/red-arrow/test/test-decimal128-array.rb   |  8 
 4 files changed, 23 insertions(+), 19 deletions(-)
 copy ruby/red-arrow/lib/arrow/{tensor.rb => decimal128-array.rb} (91%)



[arrow-site] branch master updated: ARROW-6246: [Website] Add link to R documentation site

2019-08-15 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/master by this push:
 new 41d02ac  ARROW-6246: [Website] Add link to R documentation site
41d02ac is described below

commit 41d02ac5e96fafd3dc7663d5214cdc7cd0dedb26
Author: Neal Richardson 
AuthorDate: Thu Aug 15 06:59:43 2019 -0700

ARROW-6246: [Website] Add link to R documentation site
---
 _includes/header.html | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/_includes/header.html b/_includes/header.html
index 7c02533..4174bae 100644
--- a/_includes/header.html
+++ b/_includes/header.html
@@ -52,9 +52,10 @@
 Project 
Docs
 Python
 C++
-Java 
API
-C 
GLib API
-Javascript API
+Java
+C 
GLib
+JavaScript
+R
   
 
 



[GitHub] [arrow-site] wesm merged pull request #11: ARROW-6246: [Website] Add link to R documentation site

2019-08-15 Thread GitBox
wesm merged pull request #11: ARROW-6246: [Website] Add link to R documentation 
site
URL: https://github.com/apache/arrow-site/pull/11
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [arrow-site] wesm commented on issue #11: ARROW-6246: [Website] Add link to R documentation site

2019-08-15 Thread GitBox
wesm commented on issue #11: ARROW-6246: [Website] Add link to R documentation 
site
URL: https://github.com/apache/arrow-site/pull/11#issuecomment-521651223
 
 
   LGTM, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[arrow] branch master updated: ARROW-6180: [C++][Parquet] Add RandomAccessFile::GetStream that returns InputStream that reads a file segment independent of the file's state, fix concurrent buffered Pa

2019-08-15 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 2c808a2  ARROW-6180: [C++][Parquet] Add RandomAccessFile::GetStream 
that returns InputStream that reads a file segment independent of the file's 
state, fix concurrent buffered Parquet column reads
2c808a2 is described below

commit 2c808a2cbd62300a36d682ebd7bd25ad8b6cd500
Author: Wes McKinney 
AuthorDate: Thu Aug 15 11:45:24 2019 -0500

ARROW-6180: [C++][Parquet] Add RandomAccessFile::GetStream that returns 
InputStream that reads a file segment independent of the file's state, fix 
concurrent buffered Parquet column reads

This enables different functions to read portions of a `RandomAccessFile` 
as an InputStream without interfering with each other.

This also addresses PARQUET-1636 and adds a unit test for buffered column 
chunk reads. In the refactor to use the Arrow IO interfaces, I broke this by 
allowing the raw RandomAccessFile to be passed into multiple 
`BufferedInputStream` at once, so the file position was being manipulated by 
different column readers. We didn't catch the problem because we didn't have 
any unit tests, so this patch addresses that deficiency.

Closes #5085 from wesm/ARROW-6180 and squashes the following commits:

e4ad370d5  Code review comments
2645bec64  Add unit test that exhibits PARQUET-1636
76dc71c4f  stub
3eb0136d1  Finish basic unit tests
4fd3d610a  Start implementation

Authored-by: Wes McKinney 
Signed-off-by: Wes McKinney 
---
 cpp/src/arrow/io/interfaces.cc  | 66 
 cpp/src/arrow/io/interfaces.h   | 10 +
 cpp/src/arrow/io/memory-test.cc | 67 
 cpp/src/arrow/testing/random.h  | 33 +++---
 cpp/src/parquet/properties.cc   |  7 ++-
 cpp/src/parquet/properties.h|  2 +-
 cpp/src/parquet/reader-test.cc  | 96 +
 7 files changed, 262 insertions(+), 19 deletions(-)

diff --git a/cpp/src/arrow/io/interfaces.cc b/cpp/src/arrow/io/interfaces.cc
index 06acb99..8c4f480 100644
--- a/cpp/src/arrow/io/interfaces.cc
+++ b/cpp/src/arrow/io/interfaces.cc
@@ -17,11 +17,15 @@
 
 #include "arrow/io/interfaces.h"
 
+#include 
 #include 
 #include 
 #include 
+#include 
 
+#include "arrow/buffer.h"
 #include "arrow/status.h"
+#include "arrow/util/logging.h"
 #include "arrow/util/string_view.h"
 
 namespace arrow {
@@ -70,5 +74,67 @@ Status Writable::Write(const std::string& data) {
 
 Status Writable::Flush() { return Status::OK(); }
 
+class FileSegmentReader : public InputStream {
+ public:
+  FileSegmentReader(std::shared_ptr file, int64_t 
file_offset,
+int64_t nbytes)
+  : file_(std::move(file)),
+closed_(false),
+position_(0),
+file_offset_(file_offset),
+nbytes_(nbytes) {
+FileInterface::set_mode(FileMode::READ);
+  }
+
+  Status CheckOpen() const {
+if (closed_) {
+  return Status::IOError("Stream is closed");
+}
+return Status::OK();
+  }
+
+  Status Close() override {
+closed_ = true;
+return Status::OK();
+  }
+
+  Status Tell(int64_t* position) const override {
+RETURN_NOT_OK(CheckOpen());
+*position = position_;
+return Status::OK();
+  }
+
+  bool closed() const override { return closed_; }
+
+  Status Read(int64_t nbytes, int64_t* bytes_read, void* out) override {
+RETURN_NOT_OK(CheckOpen());
+int64_t bytes_to_read = std::min(nbytes, nbytes_ - position_);
+RETURN_NOT_OK(
+file_->ReadAt(file_offset_ + position_, bytes_to_read, bytes_read, 
out));
+position_ += *bytes_read;
+return Status::OK();
+  }
+
+  Status Read(int64_t nbytes, std::shared_ptr* out) override {
+RETURN_NOT_OK(CheckOpen());
+int64_t bytes_to_read = std::min(nbytes, nbytes_ - position_);
+RETURN_NOT_OK(file_->ReadAt(file_offset_ + position_, bytes_to_read, out));
+position_ += (*out)->size();
+return Status::OK();
+  }
+
+ private:
+  std::shared_ptr file_;
+  bool closed_;
+  int64_t position_;
+  int64_t file_offset_;
+  int64_t nbytes_;
+};
+
+std::shared_ptr RandomAccessFile::GetStream(
+std::shared_ptr file, int64_t file_offset, int64_t 
nbytes) {
+  return std::make_shared(std::move(file), file_offset, 
nbytes);
+}
+
 }  // namespace io
 }  // namespace arrow
diff --git a/cpp/src/arrow/io/interfaces.h b/cpp/src/arrow/io/interfaces.h
index 678366b..95022e3 100644
--- a/cpp/src/arrow/io/interfaces.h
+++ b/cpp/src/arrow/io/interfaces.h
@@ -144,6 +144,16 @@ class ARROW_EXPORT RandomAccessFile : public InputStream, 
public Seekable {
   /// Necessary because we hold a std::unique_ptr
   ~RandomAccessFile() override;
 
+  /// \brief Create an isolated InputStream that reads a segment of a
+  /// RandomAccessFile. Multiple such stream can be 

[arrow] branch master updated: ARROW-6259: [C++] Add -Wno-extra-semi-stmt when compiling with clang 8 to work around Flatbuffers bug, suppress other new LLVM 8 warnings

2019-08-15 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new fb8cb89  ARROW-6259: [C++] Add -Wno-extra-semi-stmt when compiling 
with clang 8 to work around Flatbuffers bug, suppress other new LLVM 8 warnings
fb8cb89 is described below

commit fb8cb8968fa28c3b3e943cb86dbe5c57d97ea422
Author: Wes McKinney 
AuthorDate: Thu Aug 15 19:09:24 2019 -0500

ARROW-6259: [C++] Add -Wno-extra-semi-stmt when compiling with clang 8 to 
work around Flatbuffers bug, suppress other new LLVM 8 warnings

LLVM 8 introduces `-Wextra-semi-stmt` and Flatbuffers generates code with 
superfluous semicolons (upstream bug report 
https://github.com/google/flatbuffers/issues/5482). This is breaking our macOS 
builds for the last few hours because conda-forge upgraded their compiler 
toolchain from Apple clang 4.0.1 to clang 8.0.0 this afternoon.

Closes #5096 from wesm/ARROW-6259 and squashes the following commits:

96cbba9e8  Suppress -Wshadow-field and -Wc++2a-compat also
686339caf  Add -Wno-extra-semi-stmt when compiling with clang 
8 to work around Flatbuffers bug

Authored-by: Wes McKinney 
Signed-off-by: Wes McKinney 
---
 cpp/cmake_modules/SetupCxxFlags.cmake | 9 +
 1 file changed, 9 insertions(+)

diff --git a/cpp/cmake_modules/SetupCxxFlags.cmake 
b/cpp/cmake_modules/SetupCxxFlags.cmake
index 9eba9e8..09d5bf2 100644
--- a/cpp/cmake_modules/SetupCxxFlags.cmake
+++ b/cpp/cmake_modules/SetupCxxFlags.cmake
@@ -168,6 +168,15 @@ if("${BUILD_WARNING_LEVEL}" STREQUAL "CHECKIN")
 if("${COMPILER_VERSION}" VERSION_GREATER "3.9")
   set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} 
-Wno-zero-as-null-pointer-constant")
 endif()
+
+if("${COMPILER_VERSION}" VERSION_GREATER "7.0")
+  # ARROW-6259: Flatbuffers generates code with superfluous semicolons, so
+  # we suppress this warning for now. See upstream bug report
+  # https://github.com/google/flatbuffers/issues/5482
+  set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-extra-semi-stmt \
+-Wno-shadow-field -Wno-c++2a-compat")
+endif()
+
 set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wno-unknown-warning-option")
   elseif("${COMPILER_FAMILY}" STREQUAL "gcc")
 set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} -Wall \



[arrow] branch master updated: ARROW-6204: [GLib] Add garrow_array_is_in_chunked_array()

2019-08-15 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 9a6c82e  ARROW-6204: [GLib] Add garrow_array_is_in_chunked_array()
9a6c82e is described below

commit 9a6c82e9799cfb213f8103dfacaf36f5a30f4be8
Author: Yosuke Shiro 
AuthorDate: Fri Aug 16 06:34:52 2019 +0900

ARROW-6204: [GLib] Add garrow_array_is_in_chunked_array()

This is follow-up of 
https://github.com/apache/arrow/pull/5047#issuecomment-520103706.

Closes #5086 from shiro615/glib-isin-chunked-array and squashes the 
following commits:

6724dfdc4  Simplify
6d5105a73  Fix documents
798b6ed85  Fix test cases for Arrow::Array#is_in_chunked_array
ad98fd972  Add garrow_array_is_in_chunked_array()

Authored-by: Yosuke Shiro 
Signed-off-by: Sutou Kouhei 
---
 c_glib/arrow-glib/compute.cpp | 39 +-
 c_glib/arrow-glib/compute.h   |  6 +++
 c_glib/test/test-is-in.rb | 92 ---
 3 files changed, 114 insertions(+), 23 deletions(-)

diff --git a/c_glib/arrow-glib/compute.cpp b/c_glib/arrow-glib/compute.cpp
index b489913..fb33e72 100644
--- a/c_glib/arrow-glib/compute.cpp
+++ b/c_glib/arrow-glib/compute.cpp
@@ -25,6 +25,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1440,7 +1441,43 @@ garrow_array_is_in(GArrowArray *left,
  arrow_left_datum,
  arrow_right_datum,
  _datum);
-  if (garrow_error_check(error, status, "[array][isin]")) {
+  if (garrow_error_check(error, status, "[array][is-in]")) {
+auto arrow_array = arrow_datum.make_array();
+return GARROW_BOOLEAN_ARRAY(garrow_array_new_raw(_array));
+  } else {
+return NULL;
+  }
+}
+
+/**
+ * garrow_array_is_in_chunked_array:
+ * @left: A left hand side #GArrowArray.
+ * @right: A right hand side #GArrowChunkedArray.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Returns: (nullable) (transfer full): The #GArrowBooleanArray
+ *   showing whether each element in the left array is contained
+ *   in right chunked array.
+ *
+ * Since: 0.15.0
+ */
+GArrowBooleanArray *
+garrow_array_is_in_chunked_array(GArrowArray *left,
+ GArrowChunkedArray *right,
+ GError **error)
+{
+  auto arrow_left = garrow_array_get_raw(left);
+  auto arrow_left_datum = arrow::compute::Datum(arrow_left);
+  auto arrow_right = garrow_chunked_array_get_raw(right);
+  auto arrow_right_datum = arrow::compute::Datum(arrow_right);
+  auto memory_pool = arrow::default_memory_pool();
+  arrow::compute::FunctionContext context(memory_pool);
+  arrow::compute::Datum arrow_datum;
+  auto status = arrow::compute::IsIn(,
+ arrow_left_datum,
+ arrow_right_datum,
+ _datum);
+  if (garrow_error_check(error, status, "[array][is-in-chunked-array]")) {
 auto arrow_array = arrow_datum.make_array();
 return GARROW_BOOLEAN_ARRAY(garrow_array_new_raw(_array));
   } else {
diff --git a/c_glib/arrow-glib/compute.h b/c_glib/arrow-glib/compute.h
index 3a0b3a8..79e43e8 100644
--- a/c_glib/arrow-glib/compute.h
+++ b/c_glib/arrow-glib/compute.h
@@ -20,6 +20,7 @@
 #pragma once
 
 #include 
+#include 
 
 G_BEGIN_DECLS
 
@@ -258,5 +259,10 @@ GArrowBooleanArray *
 garrow_array_is_in(GArrowArray *left,
GArrowArray *right,
GError **error);
+GARROW_AVAILABLE_IN_0_15
+GArrowBooleanArray *
+garrow_array_is_in_chunked_array(GArrowArray *left,
+ GArrowChunkedArray *right,
+ GError **error);
 
 G_END_DECLS
diff --git a/c_glib/test/test-is-in.rb b/c_glib/test/test-is-in.rb
index 1af6ac0..5b1b360 100644
--- a/c_glib/test/test-is-in.rb
+++ b/c_glib/test/test-is-in.rb
@@ -18,31 +18,79 @@
 class TestIsIn < Test::Unit::TestCase
   include Helper::Buildable
 
-  def test_no_null
-left_array = build_int16_array([1, 0, 1, 2])
-right_array = build_int16_array([2, 0])
-assert_equal(build_boolean_array([false, true, false, true]),
- left_array.is_in(right_array))
-  end
+  sub_test_case("Array") do
+def test_no_null
+  left = build_int16_array([1, 0, 1, 2])
+  right = build_int16_array([2, 0])
+  assert_equal(build_boolean_array([false, true, false, true]),
+   left.is_in(right))
+end
 
-  def test_null_in_left_array
-left_array = build_int16_array([1, 0, nil, 2])
-right_array = build_int16_array([2, 0, 3])
-assert_equal(build_boolean_array([false, true, nil, true]),
- left_array.is_in(right_array))
-  end
+def test_null_in_left
+  left = 

[arrow] branch master updated: ARROW-6170: [R] Faster docker-compose build

2019-08-15 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new ea91067  ARROW-6170: [R] Faster docker-compose build
ea91067 is described below

commit ea9106798993c9b54127c1a6f1b13a6aa394f9de
Author: Antoine Pitrou 
AuthorDate: Fri Aug 16 07:08:24 2019 +0900

ARROW-6170: [R] Faster docker-compose build

Use parallel package compilation and installation.

Closes #5039 from pitrou/ARROW-6170-faster-build-r and squashes the 
following commits:

5ef5f06df  Hopefully appease lint thing
c40eca821  ARROW-6170:  Faster docker-compose build

Authored-by: Antoine Pitrou 
Signed-off-by: Sutou Kouhei 
---
 .dockerignore |  3 +++
 r/Dockerfile  | 11 ---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/.dockerignore b/.dockerignore
index 16bdebb..64e3890 100644
--- a/.dockerignore
+++ b/.dockerignore
@@ -38,6 +38,9 @@ cpp/.idea
 cpp/build
 cpp/*-build
 cpp/*_build
+cpp/build-debug
+cpp/build-release
+cpp/build-test
 cpp/Testing
 cpp/thirdparty
 !cpp/thirdparty/jemalloc
diff --git a/r/Dockerfile b/r/Dockerfile
index a43ac20..01262bf 100644
--- a/r/Dockerfile
+++ b/r/Dockerfile
@@ -60,9 +60,14 @@ ENV ARROW_R_DEV=TRUE
 ENV 
PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/build/cpp/src/arrow:/opt/conda/lib/pkgconfig
 ENV LD_LIBRARY_PATH=/opt/conda/lib/:/build/cpp/src/arrow:/arrow/r/src
 
-RUN Rscript -e "install.packages('devtools', repos = 
'http://cran.rstudio.com')" && \
-Rscript -e "devtools::install_github('romainfrancois/decor')" && \
-Rscript -e "install.packages(c( \
+# Ensure parallel R package installation
+RUN printf "options(Ncpus = parallel::detectCores())\n" >> /etc/R/Rprofile.site
+# Also ensure parallel compilation of each individual package
+RUN printf "MAKEFLAGS=-j8\n" >> /usr/lib/R/etc/Makeconf
+
+RUN Rscript -e "install.packages('devtools', repos = 
'http://cran.rstudio.com')"
+RUN Rscript -e "devtools::install_github('romainfrancois/decor')"
+RUN Rscript -e "install.packages(c( \
 'Rcpp', 'dplyr', 'stringr', 'glue', 'vctrs', \
 'purrr', \
 'assertthat', \



[arrow] branch master updated: ARROW-6186: [Packaging][deb] Add missing headers to libplasma-dev for Ubuntu 16.04

2019-08-15 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new be95f47  ARROW-6186: [Packaging][deb] Add missing headers to 
libplasma-dev for Ubuntu 16.04
be95f47 is described below

commit be95f4725d72205058a0e732a49163ee82305868
Author: Sutou Kouhei 
AuthorDate: Fri Aug 16 06:30:21 2019 +0900

ARROW-6186: [Packaging][deb] Add missing headers to libplasma-dev for 
Ubuntu 16.04

Closes #5050 from 
kou/packages-linux-ubuntu-xenial-add-missing-plasma-headers and squashes the 
following commits:

bd4cba03e   Add missing headers to libplasma-dev for Ubuntu 
16.04

Authored-by: Sutou Kouhei 
Signed-off-by: Sutou Kouhei 
---
 dev/tasks/linux-packages/debian.ubuntu-xenial/libplasma-dev.install | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/dev/tasks/linux-packages/debian.ubuntu-xenial/libplasma-dev.install 
b/dev/tasks/linux-packages/debian.ubuntu-xenial/libplasma-dev.install
index d3538d2..fc5904e 100644
--- a/dev/tasks/linux-packages/debian.ubuntu-xenial/libplasma-dev.install
+++ b/dev/tasks/linux-packages/debian.ubuntu-xenial/libplasma-dev.install
@@ -1,3 +1,4 @@
+usr/include/plasma/
 usr/lib/*/libplasma.a
 usr/lib/*/libplasma.so
 usr/lib/*/pkgconfig/plasma.pc



[arrow] branch master updated: ARROW-6130: [Release] Use 0.15.0 as the next release

2019-08-15 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 65b2286  ARROW-6130: [Release] Use 0.15.0 as the next release
65b2286 is described below

commit 65b2286e34f857d90245990978e56c5c7ecbb7fb
Author: Sutou Kouhei 
AuthorDate: Thu Aug 15 20:39:26 2019 -0500

ARROW-6130: [Release] Use 0.15.0 as the next release

See discussion on the mailing list:
[Discuss] Do a 0.15.0 release before 1.0.0?

https://lists.apache.org/thread.html/98b59e461c8937d33660214028dcd78a47f52fbb762217d996194941@%3Cdev.arrow.apache.org%3E

Closes #5007 from kou/release-use-0.15.0-as-the-next-release and squashes 
the following commits:

6833dd7e4   Change version to 0.15.0-SNAPSHOT by hand
66362f8bf   Remove duplicated section
dac30c581   Update .deb package names for 0.15.0
b39c0540d   Update versions for 0.15.0-SNAPSHOT

Authored-by: Sutou Kouhei 
Signed-off-by: Wes McKinney 
---
 c_glib/configure.ac|   2 +-
 c_glib/meson.build |   2 +-
 cpp/CMakeLists.txt |   2 +-
 csharp/Directory.Build.props   |   2 +-
 dev/release/rat_exclude_files.txt  |  50 ++---
 .../linux-packages/debian.ubuntu-xenial/control|  78 +++
 .../libarrow-cuda-glib15.install}  |   0
 .../libarrow-cuda15.install}   |   0
 .../libarrow-dataset15.install}|   0
 .../libarrow-glib15.install}   |   0
 .../libarrow-python15.install} |   0
 .../libarrow15.install}|   0
 .../libgandiva-glib15.install} |   0
 .../libgandiva15.install}  |   0
 .../libparquet-glib15.install} |   0
 .../libparquet15.install}  |   0
 .../libplasma-glib15.install}  |   0
 .../libplasma15.install}   |   0
 dev/tasks/linux-packages/debian/control|  84 +++
 .../libarrow-cuda-glib15.install}  |   0
 .../libarrow-cuda15.install}   |   0
 .../libarrow-dataset15.install}|   0
 ...-flight14.install => libarrow-flight15.install} |   0
 .../libarrow-glib15.install}   |   0
 .../libarrow-python15.install} |   0
 .../libarrow15.install}|   0
 .../libgandiva-glib15.install} |   0
 .../libgandiva15.install}  |   0
 .../libparquet-glib15.install} |   0
 .../libparquet15.install}  |   0
 .../libplasma-glib15.install}  |   0
 .../libplasma15.install}   |   0
 dev/tasks/tasks.yml| 248 ++---
 java/adapter/avro/pom.xml  |   2 +-
 java/adapter/jdbc/pom.xml  |   2 +-
 java/adapter/orc/pom.xml   |   2 +-
 java/algorithm/pom.xml |   2 +-
 java/flight/pom.xml|   2 +-
 java/format/pom.xml|   2 +-
 java/gandiva/pom.xml   |   2 +-
 java/memory/pom.xml|   2 +-
 java/performance/pom.xml   |   2 +-
 java/plasma/pom.xml|   2 +-
 java/pom.xml   |   2 +-
 java/tools/pom.xml |   2 +-
 java/vector/pom.xml|   2 +-
 js/package.json|   2 +-
 matlab/CMakeLists.txt  |   2 +-
 python/setup.py|   2 +-
 ruby/red-arrow-cuda/lib/arrow-cuda/version.rb  |   2 +-
 ruby/red-arrow/lib/arrow/version.rb|   2 +-
 ruby/red-gandiva/lib/gandiva/version.rb|   2 +-
 ruby/red-parquet/lib/parquet/version.rb|   2 +-
 ruby/red-plasma/lib/plasma/version.rb  |   2 +-
 rust/arrow/Cargo.toml  |   2 +-
 rust/datafusion/Cargo.toml |   6 +-
 rust/datafusion/README.md  |   2 +-
 rust/parquet/Cargo.toml|   4 +-
 rust/parquet/README.md |   4 +-
 59 files changed, 264 insertions(+), 264 deletions(-)

diff --git a/c_glib/configure.ac b/c_glib/configure.ac
index 66f88c0..e1eafd8 100644
--- a/c_glib/configure.ac
+++ b/c_glib/configure.ac
@@ -17,7 +17,7 @@
 
 AC_PREREQ(2.65)
 
-m4_define([arrow_glib_version], 1.0.0-SNAPSHOT)
+m4_define([arrow_glib_version], 

[arrow] branch master updated: ARROW-6249: [Java] Remove useless class ByteArrayWrapper

2019-08-15 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new db6d5dd  ARROW-6249: [Java] Remove useless class ByteArrayWrapper
db6d5dd is described below

commit db6d5dd55492f91ee402c7cda9a2678556c8200e
Author: tianchen 
AuthorDate: Thu Aug 15 19:25:33 2019 -0700

ARROW-6249: [Java] Remove useless class ByteArrayWrapper

Related to [ARROW-6249](https://issues.apache.org/jira/browse/ARROW-6249).

This class was introduced into encoding part to compare byte[] values 
equals.

Since now we compare value/vector equals by new added visitor API by 
ARROW-6022 instead of  comparing getObject, this class is no use anymore.

Closes #5093 from tianchen92/ARROW-6249 and squashes the following commits:

ae7e61844  ARROW-6249:  Remove useless class ByteArrayWrapper

Authored-by: tianchen 
Signed-off-by: Micah Kornfield 
---
 .../arrow/vector/dictionary/ByteArrayWrapper.java  | 52 --
 1 file changed, 52 deletions(-)

diff --git 
a/java/vector/src/main/java/org/apache/arrow/vector/dictionary/ByteArrayWrapper.java
 
b/java/vector/src/main/java/org/apache/arrow/vector/dictionary/ByteArrayWrapper.java
deleted file mode 100644
index bcfac39..000
--- 
a/java/vector/src/main/java/org/apache/arrow/vector/dictionary/ByteArrayWrapper.java
+++ /dev/null
@@ -1,52 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.arrow.vector.dictionary;
-
-import java.util.Arrays;
-
-/**
- * Wrapper class for byte array.
- */
-public class ByteArrayWrapper {
-  private final byte[] data;
-
-  /**
-   * Constructs a new instance.
-   */
-  public ByteArrayWrapper(byte[] data) {
-if (data == null) {
-  throw new NullPointerException();
-}
-
-this.data = data;
-  }
-
-  @Override
-  public boolean equals(Object other) {
-if (!(other instanceof ByteArrayWrapper)) {
-  return false;
-}
-
-return Arrays.equals(data, ((ByteArrayWrapper)other).data);
-  }
-
-  @Override
-  public int hashCode() {
-return Arrays.hashCode(data);
-  }
-}



[arrow] branch master updated: ARROW-6212: [Java] Support vector rank operation

2019-08-15 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 71b32b9  ARROW-6212: [Java] Support vector rank operation
71b32b9 is described below

commit 71b32b9b87fa9825d2112644c7ce15d6f71b9174
Author: liyafan82 
AuthorDate: Thu Aug 15 19:43:36 2019 -0700

ARROW-6212: [Java] Support vector rank operation

Given an unsorted vector, we want to get the index of the ith smallest 
element in the vector. This function is supported by the rank operation.

We provide an implementation that gets the index with the desired rank, 
without sorting the vector (the vector is left intact), and the implementation 
takes O(n) time, where n is the vector length.

Closes #5066 from liyafan82/fly_0812_rank and squashes the following 
commits:

623b08531   Support vector rank operation

Authored-by: liyafan82 
Signed-off-by: Micah Kornfield 
---
 .../apache/arrow/algorithm/rank/VectorRank.java|  89 +
 .../apache/arrow/algorithm/sort/IndexSorter.java   |  16 ++-
 .../arrow/algorithm/rank/TestVectorRank.java   | 146 +
 3 files changed, 249 insertions(+), 2 deletions(-)

diff --git 
a/java/algorithm/src/main/java/org/apache/arrow/algorithm/rank/VectorRank.java 
b/java/algorithm/src/main/java/org/apache/arrow/algorithm/rank/VectorRank.java
new file mode 100644
index 000..43c9a5b
--- /dev/null
+++ 
b/java/algorithm/src/main/java/org/apache/arrow/algorithm/rank/VectorRank.java
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.algorithm.rank;
+
+import java.util.stream.IntStream;
+
+import org.apache.arrow.algorithm.sort.IndexSorter;
+import org.apache.arrow.algorithm.sort.VectorValueComparator;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.util.Preconditions;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.ValueVector;
+
+/**
+ * Utility for calculating ranks of vector elements.
+ * @param  the vector type
+ */
+public class VectorRank {
+
+  private VectorValueComparator comparator;
+
+  /**
+   * Vector indices.
+   */
+  private IntVector indices;
+
+  private final BufferAllocator allocator;
+
+  /**
+   * Constructs a vector rank utility.
+   * @param allocator the allocator to use.
+   */
+  public VectorRank(BufferAllocator allocator) {
+this.allocator = allocator;
+  }
+
+  /**
+   * Given a rank r, gets the index of the element that is the rth smallest in 
the vector.
+   * The operation is performed without changing the vector, and takes O(n) 
time,
+   * where n is the length of the vector.
+   * @param vector the vector from which to get the element index.
+   * @param comparator the criteria for vector element comparison.
+   * @param rank the rank to determine.
+   * @return the element index with the given rank.
+   */
+  public int indexAtRank(V vector, VectorValueComparator comparator, int 
rank) {
+Preconditions.checkArgument(rank >= 0 && rank < vector.getValueCount());
+try {
+  indices = new IntVector("index vector", allocator);
+  indices.allocateNew(vector.getValueCount());
+  IntStream.range(0, vector.getValueCount()).forEach(i -> indices.set(i, 
i));
+
+  comparator.attachVector(vector);
+  this.comparator = comparator;
+
+  int pos = getRank(0, vector.getValueCount() - 1, rank);
+  return indices.get(pos);
+} finally {
+  indices.close();
+}
+  }
+
+  private int getRank(int low, int high, int rank) {
+int mid = IndexSorter.partition(low, high, indices, comparator);
+if (mid < rank) {
+  return getRank(mid + 1, high, rank);
+} else if (mid > rank) {
+  return getRank(low, mid - 1, rank);
+} else {
+  // mid == rank
+  return mid;
+}
+  }
+}
diff --git 
a/java/algorithm/src/main/java/org/apache/arrow/algorithm/sort/IndexSorter.java 
b/java/algorithm/src/main/java/org/apache/arrow/algorithm/sort/IndexSorter.java
index d85eb6f..0f03e5c 100644
--- 

[arrow] branch master updated: ARROW-6199: [Java] Avro adapter avoid potential resource leak.

2019-08-15 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new dd4532a  ARROW-6199: [Java] Avro adapter avoid potential resource leak.
dd4532a is described below

commit dd4532a0cdaccf8e7811086bc5360b13ef9a6c36
Author: tianchen 
AuthorDate: Thu Aug 15 19:49:53 2019 -0700

ARROW-6199: [Java] Avro adapter avoid potential resource leak.

Related to [ARROW-6199](https://issues.apache.org/jira/browse/ARROW-6199).

Currently, avro consumer interface has no close API, which may cause 
resource leak like AvroBytesConsumer#cacheBuffer.
To resolve this, make consumer extends AutoCloseable and create 
CompositeAvroConsumer to encompasses consume and close logic.

Closes #5059 from tianchen92/ARROW-6199 and squashes the following commits:

d60d94c48  fix
42f22da7c  clear vectors in close
5b91da75f  fix comments
3ffc07600  ARROW-6199:  Avro adapter avoid potential resource 
leak.

Authored-by: tianchen 
Signed-off-by: Micah Kornfield 
---
 .../java/org/apache/arrow/AvroToArrowUtils.java| 22 +++
 .../arrow/consumers/AvroBooleanConsumer.java   |  5 ++
 .../apache/arrow/consumers/AvroBytesConsumer.java  |  5 ++
 .../apache/arrow/consumers/AvroDoubleConsumer.java |  5 ++
 .../apache/arrow/consumers/AvroFloatConsumer.java  |  5 ++
 .../apache/arrow/consumers/AvroIntConsumer.java|  5 ++
 .../apache/arrow/consumers/AvroLongConsumer.java   |  5 ++
 .../apache/arrow/consumers/AvroNullConsumer.java   |  5 ++
 .../apache/arrow/consumers/AvroStringConsumer.java |  5 ++
 .../apache/arrow/consumers/AvroUnionsConsumer.java | 16 +++--
 .../arrow/consumers/CompositeAvroConsumer.java | 69 ++
 .../java/org/apache/arrow/consumers/Consumer.java  |  7 ++-
 .../arrow/consumers/NullableTypeConsumer.java  |  5 ++
 13 files changed, 141 insertions(+), 18 deletions(-)

diff --git 
a/java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java 
b/java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
index 25611a5..77f34df 100644
--- a/java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
+++ b/java/adapter/avro/src/main/java/org/apache/arrow/AvroToArrowUtils.java
@@ -20,7 +20,6 @@ package org.apache.arrow;
 import static org.apache.arrow.vector.types.FloatingPointPrecision.DOUBLE;
 import static org.apache.arrow.vector.types.FloatingPointPrecision.SINGLE;
 
-import java.io.EOFException;
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.HashMap;
@@ -37,6 +36,7 @@ import org.apache.arrow.consumers.AvroLongConsumer;
 import org.apache.arrow.consumers.AvroNullConsumer;
 import org.apache.arrow.consumers.AvroStringConsumer;
 import org.apache.arrow.consumers.AvroUnionsConsumer;
+import org.apache.arrow.consumers.CompositeAvroConsumer;
 import org.apache.arrow.consumers.Consumer;
 import org.apache.arrow.consumers.NullableTypeConsumer;
 import org.apache.arrow.memory.BufferAllocator;
@@ -246,19 +246,15 @@ public class AvroToArrowUtils {
 
 VectorSchemaRoot root = new VectorSchemaRoot(fields, vectors, 0);
 
-int valueCount = 0;
-while (true) {
-  try {
-for (Consumer consumer : consumers) {
-  consumer.consume(decoder);
-}
-valueCount++;
-//reach end will throw EOFException.
-  } catch (EOFException eofException) {
-root.setRowCount(valueCount);
-break;
-  }
+CompositeAvroConsumer compositeConsumer = null;
+try {
+  compositeConsumer = new CompositeAvroConsumer(consumers);
+  compositeConsumer.consume(decoder, root);
+} catch (Exception e) {
+  compositeConsumer.close();
+  throw new RuntimeException("Error occurs while consume process.", e);
 }
+
 return root;
   }
 }
diff --git 
a/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBooleanConsumer.java
 
b/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBooleanConsumer.java
index b2fe704..c2876f1 100644
--- 
a/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBooleanConsumer.java
+++ 
b/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBooleanConsumer.java
@@ -63,4 +63,9 @@ public class AvroBooleanConsumer implements Consumer {
 return this.vector;
   }
 
+  @Override
+  public void close() throws Exception {
+writer.close();
+  }
+
 }
diff --git 
a/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBytesConsumer.java
 
b/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBytesConsumer.java
index 2c649f9..c0cfaec 100644
--- 
a/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBytesConsumer.java
+++ 
b/java/adapter/avro/src/main/java/org/apache/arrow/consumers/AvroBytesConsumer.java
@@ -79,4 +79,9 @@ public class 

[arrow] branch master updated (91e33dc -> 09bb8b8)

2019-08-15 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 91e33dc  ARROW-6038: [C++] Faster type equality
 add 09bb8b8  ARROW-6219: [Java] Add API for JDBC adapter that can convert 
less then the full result set at a time

No new revisions were added by this update.

Summary of changes:
 .../arrow/adapter/jdbc/ArrowVectorIterator.java| 159 +++
 .../org/apache/arrow/adapter/jdbc/JdbcToArrow.java |  65 -
 .../arrow/adapter/jdbc/JdbcToArrowConfig.java  |  26 +-
 .../adapter/jdbc/JdbcToArrowConfigBuilder.java |  10 +-
 .../arrow/adapter/jdbc/JdbcToArrowUtils.java   |  14 +-
 .../arrow/adapter/jdbc/consumer/ArrayConsumer.java |   7 +-
 .../adapter/jdbc/consumer/BigIntConsumer.java  |   9 +-
 .../adapter/jdbc/consumer/BinaryConsumer.java  |   9 +-
 .../arrow/adapter/jdbc/consumer/BitConsumer.java   |   9 +-
 .../arrow/adapter/jdbc/consumer/BlobConsumer.java  |   9 +-
 .../arrow/adapter/jdbc/consumer/ClobConsumer.java  |   9 +-
 .../jdbc/consumer/CompositeJdbcConsumer.java   |  22 +-
 .../arrow/adapter/jdbc/consumer/DateConsumer.java  |   9 +-
 .../adapter/jdbc/consumer/DecimalConsumer.java |   9 +-
 .../adapter/jdbc/consumer/DoubleConsumer.java  |   9 +-
 .../arrow/adapter/jdbc/consumer/FloatConsumer.java |   9 +-
 .../arrow/adapter/jdbc/consumer/IntConsumer.java   |   9 +-
 .../arrow/adapter/jdbc/consumer/JdbcConsumer.java  |  10 +-
 .../adapter/jdbc/consumer/SmallIntConsumer.java|   9 +-
 .../arrow/adapter/jdbc/consumer/TimeConsumer.java  |   9 +-
 .../adapter/jdbc/consumer/TimestampConsumer.java   |   9 +-
 .../adapter/jdbc/consumer/TinyIntConsumer.java |   9 +-
 .../adapter/jdbc/consumer/VarCharConsumer.java |   9 +-
 .../arrow/adapter/jdbc/JdbcToArrowConfigTest.java  |   6 +-
 .../arrow/adapter/jdbc/h2/JdbcToArrowTest.java |  34 +--
 .../jdbc/h2/JdbcToArrowVectorIteratorTest.java | 315 +
 .../test/resources/h2/test1_all_datatypes_h2.yml   |   2 +-
 .../jdbc/src/test/resources/h2/test1_int_h2.yml|   2 +-
 28 files changed, 730 insertions(+), 77 deletions(-)
 create mode 100644 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java
 create mode 100644 
java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowVectorIteratorTest.java



[arrow] branch master updated: ARROW-5952: [Python] fix conversion of chunked dictionary array with 0 chunks

2019-08-15 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 5479d30  ARROW-5952: [Python] fix conversion of chunked dictionary 
array with 0 chunks
5479d30 is described below

commit 5479d3047a23410de00f50687764a4f4300baba5
Author: Joris Van den Bossche 
AuthorDate: Thu Aug 15 21:47:38 2019 -0500

ARROW-5952: [Python] fix conversion of chunked dictionary array with 0 
chunks

https://issues.apache.org/jira/browse/ARROW-5952

Closes #5081 from jorisvandenbossche/ARROW-5952-dictionary-zero-chunks and 
squashes the following commits:

2f11fb94d  Nits
742db0e34  create empty dictionary array of correct 
type
feb06d310  ARROW-5952:  fix conversion of chunked 
dictionary array with 0 chunks

Lead-authored-by: Joris Van den Bossche 
Co-authored-by: Wes McKinney 
Signed-off-by: Wes McKinney 
---
 cpp/src/arrow/python/arrow_to_pandas.cc | 47 -
 python/pyarrow/tests/test_pandas.py | 13 +
 2 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/cpp/src/arrow/python/arrow_to_pandas.cc 
b/cpp/src/arrow/python/arrow_to_pandas.cc
index f97782d..39857d7 100644
--- a/cpp/src/arrow/python/arrow_to_pandas.cc
+++ b/cpp/src/arrow/python/arrow_to_pandas.cc
@@ -487,7 +487,7 @@ inline Status ConvertNulls(const PandasOptions& options, 
const ChunkedArray& dat
 inline Status ConvertStruct(const PandasOptions& options, const ChunkedArray& 
data,
 PyObject** out_values) {
   PyAcquireGIL lock;
-  if (data.num_chunks() <= 0) {
+  if (data.num_chunks() == 0) {
 return Status::OK();
   }
   // ChunkedArray has at least one chunk
@@ -1042,6 +1042,14 @@ class DatetimeTZBlock : public DatetimeBlock {
   std::string timezone_;
 };
 
+Status MakeZeroLengthArray(const std::shared_ptr& type,
+   std::shared_ptr* out) {
+  std::unique_ptr builder;
+  RETURN_NOT_OK(MakeBuilder(default_memory_pool(), type, ));
+  RETURN_NOT_OK(builder->Resize(0));
+  return builder->Finish(out);
+}
+
 class CategoricalBlock : public PandasBlock {
  public:
   explicit CategoricalBlock(const PandasOptions& options, MemoryPool* pool,
@@ -1063,6 +1071,10 @@ class CategoricalBlock : public PandasBlock {
 using T = typename TRAITS::T;
 constexpr int npy_type = TRAITS::npy_type;
 
+if (data->num_chunks() == 0) {
+  RETURN_NOT_OK(AllocateNDArray(npy_type, 1));
+  return Status::OK();
+}
 // Sniff the first chunk
 const std::shared_ptr arr_first = data->chunk(0);
 const auto& dict_arr_first = checked_cast(*arr_first);
@@ -1132,15 +1144,17 @@ class CategoricalBlock : public PandasBlock {
   converted_data = out.chunked_array();
 } else {
   // check if all dictionaries are equal
-  const std::shared_ptr arr_first = data->chunk(0);
-  const auto& dict_arr_first = checked_cast(*arr_first);
+  if (data->num_chunks() > 1) {
+const std::shared_ptr arr_first = data->chunk(0);
+const auto& dict_arr_first = checked_cast(*arr_first);
 
-  for (int c = 1; c < data->num_chunks(); c++) {
-const std::shared_ptr arr = data->chunk(c);
-const auto& dict_arr = checked_cast(*arr);
+for (int c = 1; c < data->num_chunks(); c++) {
+  const std::shared_ptr arr = data->chunk(c);
+  const auto& dict_arr = checked_cast(*arr);
 
-if (!(dict_arr_first.dictionary()->Equals(dict_arr.dictionary( {
-  return Status::NotImplemented("Variable dictionary type not 
supported");
+  if (!(dict_arr_first.dictionary()->Equals(dict_arr.dictionary( {
+return Status::NotImplemented("Variable dictionary type not 
supported");
+  }
 }
   }
   converted_data = data;
@@ -1168,13 +1182,20 @@ class CategoricalBlock : public PandasBlock {
 }
 
 // TODO(wesm): variable dictionaries
-auto arr = converted_data->chunk(0);
-const auto& dict_arr = checked_cast(*arr);
+std::shared_ptr dict;
+if (data->num_chunks() == 0) {
+  // no dictionary values => create empty array
+  RETURN_NOT_OK(MakeZeroLengthArray(dict_type.value_type(), ));
+} else {
+  auto arr = converted_data->chunk(0);
+  const auto& dict_arr = checked_cast(*arr);
+  dict = dict_arr.dictionary();
+}
 
 placement_data_[rel_placement] = abs_placement;
-PyObject* dict;
-RETURN_NOT_OK(ConvertArrayToPandas(options_, dict_arr.dictionary(), 
nullptr, ));
-dictionary_.reset(dict);
+PyObject* pydict;
+RETURN_NOT_OK(ConvertArrayToPandas(options_, dict, nullptr, ));
+dictionary_.reset(pydict);
 ordered_ = dict_type.ordered();
 
 return Status::OK();
diff --git a/python/pyarrow/tests/test_pandas.py 
b/python/pyarrow/tests/test_pandas.py
index 12a6bc3..437fdad 

[arrow] branch master updated: ARROW-6262: [Developer] Show JIRA issue before merging

2019-08-15 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 884ed65  ARROW-6262: [Developer] Show JIRA issue before merging
884ed65 is described below

commit 884ed654e26114798fca486e3742caa97a544b7b
Author: Sutou Kouhei 
AuthorDate: Thu Aug 15 21:46:31 2019 -0500

ARROW-6262: [Developer] Show JIRA issue before merging

It's useful to confirm whehter the associated JIRA issue is right or
not.

We couldn't find wrong associated JIRA issue after we merge
the pull request https://github.com/apache/arrow/pull/5050 .

Closes #5097 from kou/dev-merge-show-jira-issue-before-merge and squashes 
the following commits:

6c9ad5be9   Show JIRA issue before merging

Authored-by: Sutou Kouhei 
Signed-off-by: Wes McKinney 
---
 dev/merge_arrow_pr.py | 47 +++
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/dev/merge_arrow_pr.py b/dev/merge_arrow_pr.py
index dfe9e33..7588fef 100755
--- a/dev/merge_arrow_pr.py
+++ b/dev/merge_arrow_pr.py
@@ -187,12 +187,6 @@ class JiraIssue(object):
 self.cmd.fail("JIRA issue %s already has status '%s'"
   % (self.jira_id, cur_status))
 
-console_output = format_resolved_issue_status(self.jira_id, cur_status,
-  fields.summary,
-  fields.assignee,
-  fields.components)
-print(console_output)
-
 resolve = [x for x in self.jira_con.transitions(self.jira_id)
if x['name'] == "Resolve Issue"][0]
 self.jira_con.transition_issue(self.jira_id, resolve["id"],
@@ -201,27 +195,31 @@ class JiraIssue(object):
 
 print("Successfully resolved %s!" % (self.jira_id))
 
+self.issue = self.jira_con.issue(self.jira_id)
+self.show()
 
-def format_resolved_issue_status(jira_id, status, summary, assignee,
- components):
-if assignee is None:
-assignee = "NOT ASSIGNED!!!"
-else:
-assignee = assignee.displayName
+def show(self):
+fields = self.issue.fields
 
-if len(components) == 0:
-components = 'NO COMPONENTS!!!'
-else:
-components = ', '.join((x.name for x in components))
+assignee = fields.assignee
+if assignee is None:
+assignee = "NOT ASSIGNED!!!"
+else:
+assignee = assignee.displayName
+
+components = fields.components
+if len(components) == 0:
+components = 'NO COMPONENTS!!!'
+else:
+components = ', '.join((x.name for x in components))
 
-return """=== JIRA {} ===
-Summary\t\t{}
-Assignee\t{}
-Components\t{}
-Status\t\t{}
-URL\t\t{}/{}""".format(jira_id, summary, assignee, components, status,
-   '/'.join((JIRA_API_BASE, 'browse')),
-   jira_id)
+print("=== JIRA {} ===".format(self.jira_id))
+print("Summary\t\t{}".format(fields.summary))
+print("Assignee\t{}".format(assignee))
+print("Components\t{}".format(components))
+print("Status\t\t{}".format(fields.status.name))
+print("URL\t\t{}/{}".format('/'.join((JIRA_API_BASE, 'browse')),
+self.jira_id))
 
 
 class GitHubAPI(object):
@@ -293,6 +291,7 @@ class PullRequest(object):
 print("\n=== Pull Request #%s ===" % self.number)
 print("title\t%s\nsource\t%s\ntarget\t%s\nurl\t%s"
   % (self.title, self.description, self.target_ref, self.url))
+self.jira_issue.show()
 
 @property
 def is_merged(self):



[arrow] branch master updated: ARROW-6185: [Java] Provide hash table based dictionary builder

2019-08-15 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b971ee  ARROW-6185: [Java] Provide hash table based dictionary builder
4b971ee is described below

commit 4b971ee0948bc12ef9955f743882bd1ce3452231
Author: liyafan82 
AuthorDate: Thu Aug 15 20:19:45 2019 -0700

ARROW-6185: [Java] Provide hash table based dictionary builder

This is related ARROW-5862. We provide another type of dictionary builder 
based on hash table. Compared with a search based dictionary encoder, a hash 
table based encoder process each new element in O(1) time, but require extra 
memory space.

Closes #5054 from liyafan82/fly_0809_hashbuild and squashes the following 
commits:

77e24531e   Provide hash table based dictionary builder

Authored-by: liyafan82 
Signed-off-by: Micah Kornfield 
---
 .../HashTableBasedDictionaryBuilder.java   | 174 ++
 .../TestHashTableBasedDictionaryEncoder.java   | 203 +
 2 files changed, 377 insertions(+)

diff --git 
a/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/HashTableBasedDictionaryBuilder.java
 
b/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/HashTableBasedDictionaryBuilder.java
new file mode 100644
index 000..eff0f05
--- /dev/null
+++ 
b/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/HashTableBasedDictionaryBuilder.java
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.algorithm.dictionary;
+
+import java.util.HashMap;
+
+import org.apache.arrow.memory.util.ArrowBufPointer;
+import org.apache.arrow.memory.util.hash.ArrowBufHasher;
+import org.apache.arrow.memory.util.hash.SimpleHasher;
+import org.apache.arrow.vector.ElementAddressableVector;
+
+/**
+ * A dictionary builder is intended for the scenario frequently encountered in 
practice:
+ * the dictionary is not known a priori, so it is generated dynamically.
+ * In particular, when a new value arrives, it is tested to check if it is 
already
+ * in the dictionary. If so, it is simply neglected, otherwise, it is added to 
the dictionary.
+ *
+ * 
+ * This class builds the dictionary based on a hash table.
+ * Each add operation can be finished in O(1) time,
+ * where n is the current dictionary size.
+ * 
+ * 
+ * The dictionary builder is intended to build a single dictionary.
+ * So it cannot be used for different dictionaries.
+ * 
+ * Below gives the sample code for using the dictionary builder
+ * {@code
+ * HashTableBasedDictionaryBuilder dictionaryBuilder = ...
+ * ...
+ * dictionaryBuild.addValue(newValue);
+ * ...
+ * }
+ * 
+ * 
+ *   With the above code, the dictionary vector will be populated,
+ *   and it can be retrieved by the {@link 
HashTableBasedDictionaryBuilder#getDictionary()} method.
+ *   After that, dictionary encoding can proceed with the populated dictionary 
encoder.
+ * 
+ *
+ * @param  the dictionary vector type.
+ */
+public class HashTableBasedDictionaryBuilder {
+
+  /**
+   * The dictionary to be built.
+   */
+  private final V dictionary;
+
+  /**
+   * If null should be encoded.
+   */
+  private final boolean encodeNull;
+
+  /**
+   * The hash map for distinct dictionary entries.
+   * The key is the pointer to the dictionary element, whereas the value is 
the index in the dictionary.
+   */
+  private HashMap hashMap = new HashMap<>();
+
+  /**
+   * The hasher used for calculating the hash code.
+   */
+  private final ArrowBufHasher hasher;
+
+  /**
+   * Next pointer to try to add to the hash table.
+   */
+  private ArrowBufPointer nextPointer;
+
+  /**
+   * Constructs a hash table based dictionary builder.
+   *
+   * @param dictionary the dictionary to populate.
+   */
+  public HashTableBasedDictionaryBuilder(V dictionary) {
+this(dictionary, false);
+  }
+
+  /**
+   * Constructs a hash table based dictionary builder.
+   *
+   * @param dictionary the dictionary to populate.
+   * @param encodeNull if null values should be added to the 

[arrow] branch master updated: ARROW-6038: [C++] Faster type equality

2019-08-15 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 91e33dc  ARROW-6038: [C++] Faster type equality
91e33dc is described below

commit 91e33dcb6aa3c05eaf9d9d9f09579bb29e3fe175
Author: Antoine Pitrou 
AuthorDate: Thu Aug 15 21:29:00 2019 -0500

ARROW-6038: [C++] Faster type equality

When checking for type equality, compute and cache a fingerprint of the 
type so as to avoid costly nested type walking and multiple comparisons.

Before:
```

Benchmark Time   CPU Iterations

TypeEqualsSimple 13 ns 13 ns   55242976   150.558M 
items/s
TypeEqualsComplex   430 ns430 ns1637275   4.43634M 
items/s
TypeEqualsWithMetadata  595 ns595 ns1199216   3.20778M 
items/s
SchemaEquals   1465 ns   1465 ns 479512   1.30226M 
items/s
SchemaEqualsWithMetadata922 ns922 ns 7637522.0683M 
items/s
```

After:
```

Benchmark Time   CPU Iterations

TypeEqualsSimple 11 ns 11 ns   65531752   178.723M 
items/s
TypeEqualsComplex20 ns 20 ns   33939830   95.1497M 
items/s
TypeEqualsWithMetadata   31 ns 31 ns   22979555   62.4052M 
items/s
SchemaEquals 40 ns 40 ns   17786532   48.1683M 
items/s
SchemaEqualsWithMetadata 46 ns 46 ns   15173158   41.3242M 
items/s
```

Closes #4983 from pitrou/ARROW-6038-faster-type-equality and squashes the 
following commits:

2fdaf4adb  ARROW-6038:  Faster type equality

Authored-by: Antoine Pitrou 
Signed-off-by: Wes McKinney 
---
 cpp/src/arrow/CMakeLists.txt  |   1 +
 cpp/src/arrow/compare.cc  |  24 +-
 cpp/src/arrow/extension_type-test.cc  |  11 +
 cpp/src/arrow/type-benchmark.cc   | 170 +
 cpp/src/arrow/type-test.cc| 268 +++
 cpp/src/arrow/type.cc | 354 +-
 cpp/src/arrow/type.h  | 155 ++-
 cpp/src/arrow/util/key-value-metadata-test.cc |  18 ++
 cpp/src/arrow/util/key_value_metadata.cc  |  11 +
 cpp/src/arrow/util/key_value_metadata.h   |   2 +
 integration/integration_test.py   |  61 ++---
 11 files changed, 961 insertions(+), 114 deletions(-)

diff --git a/cpp/src/arrow/CMakeLists.txt b/cpp/src/arrow/CMakeLists.txt
index 0085238..4839fb8 100644
--- a/cpp/src/arrow/CMakeLists.txt
+++ b/cpp/src/arrow/CMakeLists.txt
@@ -381,6 +381,7 @@ add_arrow_test(tensor-test)
 add_arrow_test(sparse_tensor-test)
 
 add_arrow_benchmark(builder-benchmark)
+add_arrow_benchmark(type-benchmark)
 
 add_subdirectory(array)
 add_subdirectory(csv)
diff --git a/cpp/src/arrow/compare.cc b/cpp/src/arrow/compare.cc
index 05a1d1f..222d4f9 100644
--- a/cpp/src/arrow/compare.cc
+++ b/cpp/src/arrow/compare.cc
@@ -1163,21 +1163,35 @@ bool SparseTensorEquals(const SparseTensor& left, const 
SparseTensor& right) {
 }
 
 bool TypeEquals(const DataType& left, const DataType& right, bool 
check_metadata) {
-  bool are_equal;
   // The arrays are the same object
   if ( == ) {
-are_equal = true;
+return true;
   } else if (left.id() != right.id()) {
-are_equal = false;
+return false;
   } else {
+// First try to compute fingerprints
+if (check_metadata) {
+  const auto& left_metadata_fp = left.metadata_fingerprint();
+  const auto& right_metadata_fp = right.metadata_fingerprint();
+  if (left_metadata_fp != right_metadata_fp) {
+return false;
+  }
+}
+
+const auto& left_fp = left.fingerprint();
+const auto& right_fp = right.fingerprint();
+if (!left_fp.empty() && !right_fp.empty()) {
+  return left_fp == right_fp;
+}
+
+// TODO remove check_metadata here?
 internal::TypeEqualsVisitor visitor(right, check_metadata);
 auto error = VisitTypeInline(left, );
 if (!error.ok()) {
   DCHECK(false) << "Types are not comparable: " << error.ToString();
 }
-are_equal = visitor.result();
+return visitor.result();
   }
-  return are_equal;
 }
 
 bool ScalarEquals(const Scalar& left, const Scalar& right) {
diff --git a/cpp/src/arrow/extension_type-test.cc 
b/cpp/src/arrow/extension_type-test.cc
index 2f680af..06fd6a9 100644
--- a/cpp/src/arrow/extension_type-test.cc
+++ 

[arrow] branch master updated: ARROW-5862: [Java] Provide dictionary builder

2019-08-15 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 1f5ebd0  ARROW-5862: [Java] Provide dictionary builder
1f5ebd0 is described below

commit 1f5ebd0fae2c49831d9c52c64bd5e1b81e1b860a
Author: liyafan82 
AuthorDate: Thu Aug 15 19:54:18 2019 -0700

ARROW-5862: [Java] Provide dictionary builder

The dictionary builder servers for the following scenario which is 
frequently encountered in practice when dictionary encoding is involved: the 
dictionary values are not known a priori, so they are determined dynamically, 
as new data arrive continually.

In particular, when a new value arrives, it is tested to check if it is 
already in the dictionary. If so, it is simply neglected, otherwise, it is 
added to the dictionary.

When all values have been evaluated, the dictionary can be considered 
complete. So encoding can start afterward.

The code snippet using a dictionary builder should be like this:

DictonaryBuilder dictionaryBuilder = ...
dictionaryBuilder.startBuild();
...
dictionaryBuild.addValue(newValue);
...
dictionaryBuilder.endBuild();

Closes #4813 from liyafan82/fly_0705_build and squashes the following 
commits:

2007b87c7   Provide dictionary builder

Authored-by: liyafan82 
Signed-off-by: Micah Kornfield 
---
 .../SearchTreeBasedDictionaryBuilder.java  | 162 +++
 .../TestSearchTreeBasedDictionaryBuilder.java  | 222 +
 2 files changed, 384 insertions(+)

diff --git 
a/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/SearchTreeBasedDictionaryBuilder.java
 
b/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/SearchTreeBasedDictionaryBuilder.java
new file mode 100644
index 000..a6f5642
--- /dev/null
+++ 
b/java/algorithm/src/main/java/org/apache/arrow/algorithm/dictionary/SearchTreeBasedDictionaryBuilder.java
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.algorithm.dictionary;
+
+import java.util.TreeSet;
+
+import org.apache.arrow.algorithm.sort.VectorValueComparator;
+import org.apache.arrow.vector.ValueVector;
+
+/**
+ * A dictionary builder is intended for the scenario frequently encountered in 
practice:
+ * the dictionary is not known a priori, so it is generated dynamically.
+ * In particular, when a new value arrives, it is tested to check if it is 
already
+ * in the dictionary. If so, it is simply neglected, otherwise, it is added to 
the dictionary.
+ *
+ * 
+ *   This class builds the dictionary based on a binary search tree.
+ *   Each add operation can be finished in O(log(n)) time,
+ *   where n is the current dictionary size.
+ * 
+ * 
+ *   The dictionary builder is intended to build a single dictionary.
+ *   So it cannot be used for different dictionaries.
+ * 
+ * Below gives the sample code for using the dictionary builder
+ * {@code
+ * SearchTreeBasedDictionaryBuilder dictionaryBuilder = ...
+ * ...
+ * dictionaryBuild.addValue(newValue);
+ * ...
+ * }
+ * 
+ * 
+ *  With the above code, the dictionary vector will be populated,
+ *  and it can be retrieved by the {@link 
SearchTreeBasedDictionaryBuilder#getDictionary()} method.
+ *  After that, dictionary encoding can proceed with the populated dictionary.
+ * 
+ * @param  the dictionary vector type.
+ */
+public class SearchTreeBasedDictionaryBuilder {
+
+  /**
+   * The dictionary to be built.
+   */
+  private final V dictionary;
+
+  /**
+   * The criteria for sorting in the search tree.
+   */
+  protected final VectorValueComparator comparator;
+
+  /**
+   * If null should be encoded.
+   */
+  private final boolean encodeNull;
+
+  /**
+   * The search tree for storing the value index.
+   */
+  private TreeSet searchTree;
+
+  /**
+   * Construct a search tree-based dictionary builder.
+   * @param dictionary the dictionary vector.
+   * @param comparator the criteria for value equality.
+   */
+  public SearchTreeBasedDictionaryBuilder(V dictionary, 

[arrow] branch master updated (4b971ee -> 3420d30)

2019-08-15 Thread ravindra
This is an automated email from the ASF dual-hosted git repository.

ravindra pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 4b971ee  ARROW-6185: [Java] Provide hash table based dictionary builder
 add 3420d30  ARROW-6208: [Java] Correct byte order before comparing in 
ByteFunctionHelpers

No new revisions were added by this update.

Summary of changes:
 .../arrow/memory/util/ByteFunctionHelpers.java |  4 ++--
 .../arrow/memory/util/TestByteFunctionHelpers.java | 22 ++
 2 files changed, 24 insertions(+), 2 deletions(-)