[arrow] branch master updated: ARROW-3769: [C++] Add support for reading non-dictionary encoded binary Parquet columns directly as DictionaryArray

2019-03-17 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new fd0b90a  ARROW-3769: [C++] Add support for reading non-dictionary 
encoded binary Parquet columns directly as DictionaryArray
fd0b90a is described below

commit fd0b90a7f7e65fde32af04c4746004a1240914cf
Author: Hatem Helal 
AuthorDate: Sun Mar 17 19:13:41 2019 -0500

ARROW-3769: [C++] Add support for reading non-dictionary encoded binary 
Parquet columns directly as DictionaryArray

This patch addresses the following JIRAS:

* [ARROW-3769](https://issues.apache.org/jira/browse/ARROW-3769): 
refactored record reader logic to toggle between the different builder 
depending on the column type (String or Binary) and the requested array type 
(Chunked "dense" or Dictionary).  These changes are covered by unittests and 
benchmarks.
* [PARQUET-1537](https://issues.apache.org/jira/browse/PARQUET-1537): fixed 
increment and covered by unittests.

Also included is an experimental class `ArrowReaderProperties` that can be 
used to select which columns are read directly as an `arrow::DictionaryArray`.  
I think some more work is needed to fully address the requests in 
[ARROW-3772](https://issues.apache.org/jira/browse/ARROW-3772).  Namely, the 
ability automatically infer which columns in a parquet file should be read as 
`DictionaryArray`.  My current thinking is that this would be solved by 
introducing optional arrow type metadata  [...]

Note that the behavior with this patch is that incremental reading of a 
parquet file will not resolve the global dictionary for all of the row groups.  
There are a few possible solutions for this:

* Introduce a concept of an "unknown" dictionary.  This will enable 
concatenating multiple row groups together so long as we define unknown 
dictionaries as equal (assuming indices have the same data type)
* Add an API for merging the schemas from multiple tables together.  This 
could be used after reading multiple row groups to enable concatenating the 
tables together into one.
* Add an API for inferring the global dictionary for the entire file.  This 
could be an expensive operation so ideally would be made optional.
* Allow a user-specified dictionary.  This could be useful in the limited 
case where a caller already knows the global dictionary list (computed through 
some other mechanism).

Author: Hatem Helal 
Author: Hatem Helal 
Author: Hatem Helal 

Closes #3721 from hatemhelal/arrow-3769 and squashes the following commits:

f644fff9c  Move schema fix logic to post-processing step
023c022c3  Add virtual destructor to WrappedBuilderInterface
99e9dee12  Removed dependencies on arrow builder in 
parquet/encoding
2026b513c  Rework ByteArrayDecoder interface to reduce code 
duplication
5bc933b97  use PutSpaced in test setup to correctly initialize 
encoded data
2c8fa7efd  revert incorrect changes to 
PlainByteArrayDecoder::DecodeArrow method
7719b944f  Use random string generator instead of poor JSON
e6ca0db43  Fix DictEncoding test: need to use PutSpaced 
instead of Put in setup
9da133142  Temporarily disable tests for arrow builder 
decoding from dictionary encoded col
7347cfa26  Fix DecodeArrow from plain encoded columns
5fb9e860a  Rework parquet encoding tests
4d7bb30de  Refactor dictionary data generation into 
RandomArrayGenerator
6e65fdbdf  simplify ArrowReaderProperties and mark as 
experimental
babe52e38  replace deprecated ReadableFileInterface with 
RandomAccessFile
a267a27d4  remove unnecessary inlines
7aac84c45  Reworked encoding benchmark to reduce code 
duplication
077a8f1ae  Move function definition to (hopefully) resolve 
appveyor build failure due to C2491
a35754456  Basic unittests for reading DictionaryArray 
directly from parquet
a6740f31e  Make sure to update the schema when reading a 
column as a DictionaryArray
a8c15354e  Add support for requesting a parquet column be read 
as a DictionaryArray
28d76b7b2  Add benchmark for dictionary decoding using arrow 
builder
8f59198e8  Add overloads for decoding using a 
StringDictionaryBuilder
b16eaa978  prefer default_random_engine to avoid potential 
slowdown with Mersenne Twister prng
ff380211c  prefer mersenne twister prng over default one which 
is implemenation defined
78eddb8af  Use value parameterization in decoding tests
84df23bfa  prefer range-based for loop to KeepRunning while 
loop pattern
f234ca2a2  respond to code review feedback - many readability 
fixes in benchmark and tests
4fbcf1fab  fix loop increment in templated 
PlainByteArrayDecoder::DecodeArrow method
39a5f1994  fix appveyor windows failure
89de5d5be  rework data generation so that decoding 

[arrow] branch master updated: ARROW-4937: [R] Clean pkg-config related logic

2019-03-17 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new a530848  ARROW-4937: [R] Clean pkg-config related logic
a530848 is described below

commit a530848605c1d0249d659b81a7b794e2c6755c64
Author: Kouhei Sutou 
AuthorDate: Mon Mar 18 09:04:57 2019 +0900

ARROW-4937: [R] Clean pkg-config related logic

* Remove unused codes
* Hide error messages from pkg-config (We report our error messages)

Author: Kouhei Sutou 

Closes #3951 from kou/r-configure-pkg-config-clean and squashes the 
following commits:

e5383fd8   Clean pkg-config related logic
---
 r/configure | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/r/configure b/r/configure
index f0c6d49..a3bc690 100755
--- a/r/configure
+++ b/r/configure
@@ -37,15 +37,13 @@ PKG_LIBS=""
 # Use pkg-config if available
 pkg-config --version >/dev/null 2>&1
 if [ $? -eq 0 ]; then
-  PKGCONFIG_CFLAGS=`pkg-config --cflags --silence-errors ${PKG_CONFIG_NAME}`
-  PKGCONFIG_LIBS=`pkg-config --libs ${PKG_CONFIG_NAME}`
-  PKGCONFIG_CFLAGS=$(pkg-config --cflags arrow)
+  PKGCONFIG_CFLAGS=$(pkg-config --cflags --silence-errors arrow)
   if [ $? -ne 0 ]; then
 echo "Apache Arrow C++ was not found using pkg-config"
 exit 1
   fi
   PKGCONFIG_LIBS=$(pkg-config --libs arrow)
-  PKGCONFIG_CFLAGS_PARQUET=$(pkg-config --cflags parquet)
+  PKGCONFIG_CFLAGS_PARQUET=$(pkg-config --cflags --silence-errors parquet)
   if [ $? -eq 0 ]; then
 PKGCONFIG_CFLAGS="${PKGCONFIG_CFLAGS} ${PKGCONFIG_CFLAGS_PARQUET} 
-DARROW_R_WITH_PARQUET"
 PKGCONFIG_LIBS="${PKGCONFIG_LIBS} $(pkg-config --libs parquet)"



[arrow] branch master updated: ARROW-4932: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro

2019-03-17 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 2f740ac  ARROW-4932: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro
2f740ac is described below

commit 2f740ac8840cd527caeca83ed19953decfc32e12
Author: Kenta Murata 
AuthorDate: Mon Mar 18 09:03:37 2019 +0900

ARROW-4932: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro

Author: Kenta Murata 

Closes #3945 from mrkn/glib_use_g_declare_derivable_type and squashes the 
following commits:

4067d10a  Fix the parent class of GArrowStringArrayBuilder
eb122593  Use G_DECLARE_DERIVABLE_TYPE
---
 c_glib/arrow-glib/array-builder.h   | 864 +---
 c_glib/arrow-glib/basic-array.h | 172 +--
 c_glib/arrow-glib/basic-data-type.h | 375 ++
 c_glib/arrow-glib/chunked-array.h   |  43 +-
 c_glib/arrow-glib/composite-array.h |  43 +-
 c_glib/arrow-glib/composite-data-type.h |  42 +-
 c_glib/arrow-glib/field.h   |  43 +-
 c_glib/arrow-glib/record-batch.h|  43 +-
 c_glib/arrow-glib/tensor.h  |  35 +-
 9 files changed, 226 insertions(+), 1434 deletions(-)

diff --git a/c_glib/arrow-glib/array-builder.h 
b/c_glib/arrow-glib/array-builder.h
index 9fcadbd..075f080 100644
--- a/c_glib/arrow-glib/array-builder.h
+++ b/c_glib/arrow-glib/array-builder.h
@@ -70,46 +70,16 @@ gboolean 
garrow_null_array_builder_append_nulls(GArrowNullArrayBuilder *builder,
 
 #define GARROW_TYPE_BOOLEAN_ARRAY_BUILDER   \
   (garrow_boolean_array_builder_get_type())
-#define GARROW_BOOLEAN_ARRAY_BUILDER(obj)   \
-  (G_TYPE_CHECK_INSTANCE_CAST((obj),\
-  GARROW_TYPE_BOOLEAN_ARRAY_BUILDER,\
-  GArrowBooleanArrayBuilder))
-#define GARROW_BOOLEAN_ARRAY_BUILDER_CLASS(klass)   \
-  (G_TYPE_CHECK_CLASS_CAST((klass), \
-   GARROW_TYPE_BOOLEAN_ARRAY_BUILDER,   \
-   GArrowBooleanArrayBuilderClass))
-#define GARROW_IS_BOOLEAN_ARRAY_BUILDER(obj)\
-  (G_TYPE_CHECK_INSTANCE_TYPE((obj),\
-  GARROW_TYPE_BOOLEAN_ARRAY_BUILDER))
-#define GARROW_IS_BOOLEAN_ARRAY_BUILDER_CLASS(klass)\
-  (G_TYPE_CHECK_CLASS_TYPE((klass), \
-   GARROW_TYPE_BOOLEAN_ARRAY_BUILDER))
-#define GARROW_BOOLEAN_ARRAY_BUILDER_GET_CLASS(obj) \
-  (G_TYPE_INSTANCE_GET_CLASS((obj), \
- GARROW_TYPE_BOOLEAN_ARRAY_BUILDER, \
- GArrowBooleanArrayBuilderClass))
-
-typedef struct _GArrowBooleanArrayBuilder GArrowBooleanArrayBuilder;
-typedef struct _GArrowBooleanArrayBuilderClass
GArrowBooleanArrayBuilderClass;
-
-/**
- * GArrowBooleanArrayBuilder:
- *
- * It wraps `arrow::BooleanBuilder`.
- */
-struct _GArrowBooleanArrayBuilder
-{
-  /*< private >*/
-  GArrowArrayBuilder parent_instance;
-};
-
+G_DECLARE_DERIVABLE_TYPE(GArrowBooleanArrayBuilder,
+ garrow_boolean_array_builder,
+ GARROW,
+ BOOLEAN_ARRAY_BUILDER,
+ GArrowArrayBuilder)
 struct _GArrowBooleanArrayBuilderClass
 {
   GArrowArrayBuilderClass parent_class;
 };
 
-GType garrow_boolean_array_builder_get_type(void) G_GNUC_CONST;
-
 GArrowBooleanArrayBuilder *garrow_boolean_array_builder_new(void);
 
 #ifndef GARROW_DISABLE_DEPRECATED
@@ -135,48 +105,17 @@ gboolean 
garrow_boolean_array_builder_append_nulls(GArrowBooleanArrayBuilder *bu
GError **error);
 
 
-#define GARROW_TYPE_INT_ARRAY_BUILDER   \
-  (garrow_int_array_builder_get_type())
-#define GARROW_INT_ARRAY_BUILDER(obj)   \
-  (G_TYPE_CHECK_INSTANCE_CAST((obj),\
-  GARROW_TYPE_INT_ARRAY_BUILDER,\
-  GArrowIntArrayBuilder))
-#define GARROW_INT_ARRAY_BUILDER_CLASS(klass)   \
-  (G_TYPE_CHECK_CLASS_CAST((klass), \
-   GARROW_TYPE_INT_ARRAY_BUILDER,   \
-   GArrowIntArrayBuilderClass))
-#define GARROW_IS_INT_ARRAY_BUILDER(obj)\
-  (G_TYPE_CHECK_INSTANCE_TYPE((obj),\
-  GARROW_TYPE_INT_ARRAY_BUILDER))
-#define GARROW_IS_INT_ARRAY_BUILDER_CLASS(klass)\
-  (G_TYPE_CHECK_CLASS_TYPE((klass), \
-   GARROW_TYPE_INT_ARRAY_BUILDER))
-#define 

[arrow] branch master updated: ARROW-4929: [GLib] Add garrow_array_count_values()

2019-03-17 Thread shiro
This is an automated email from the ASF dual-hosted git repository.

shiro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 201a3bc  ARROW-4929: [GLib] Add garrow_array_count_values()
201a3bc is described below

commit 201a3bc9186aecd3b22529d97af30ac1fab25a3a
Author: Kouhei Sutou 
AuthorDate: Mon Mar 18 08:39:42 2019 +0900

ARROW-4929: [GLib] Add garrow_array_count_values()

Author: Kouhei Sutou 

Closes #3941 from kou/glib-count-values and squashes the following commits:

f9e3bc51  Don't use special characters in HTML
321fe28a  Move compute related code to compute.{cpp,h}
c8bd73bc  Add missing (nullable) attribute
7ff43645  Fix a typo
18b8d89e  Fix markup
95c08075   Add garrow_array_count_values()
---
 c_glib/arrow-glib/basic-array.cpp | 570 
 c_glib/arrow-glib/basic-array.h   |  66 
 c_glib/arrow-glib/composite-array.h   |  43 +--
 c_glib/arrow-glib/compute.cpp | 612 +-
 c_glib/arrow-glib/compute.h   |  71 +++-
 c_glib/doc/arrow-glib/arrow-glib-docs.xml |   5 +-
 c_glib/gandiva-glib/node.cpp  |   2 +-
 c_glib/test/test-count-values.rb  |  51 +++
 8 files changed, 738 insertions(+), 682 deletions(-)

diff --git a/c_glib/arrow-glib/basic-array.cpp 
b/c_glib/arrow-glib/basic-array.cpp
index 8f27e26..b051c97 100644
--- a/c_glib/arrow-glib/basic-array.cpp
+++ b/c_glib/arrow-glib/basic-array.cpp
@@ -24,7 +24,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -83,34 +82,6 @@ garrow_primitive_array_new(GArrowDataType *data_type,
   return garrow_array_new_raw(_array);
 };
 
-template 
-typename ArrowType::c_type
-garrow_numeric_array_sum(GArrowArrayType array,
- GError **error,
- const gchar *tag,
- typename ArrowType::c_type default_value)
-{
-  auto arrow_array = garrow_array_get_raw(GARROW_ARRAY(array));
-  auto memory_pool = arrow::default_memory_pool();
-  arrow::compute::FunctionContext context(memory_pool);
-  arrow::compute::Datum sum_datum;
-  auto status = arrow::compute::Sum(,
-arrow_array,
-_datum);
-  if (garrow_error_check(error, status, tag)) {
-using ScalarType = typename arrow::TypeTraits::ScalarType;
-auto arrow_numeric_scalar =
-  std::dynamic_pointer_cast(sum_datum.scalar());
-if (arrow_numeric_scalar->is_valid) {
-  return arrow_numeric_scalar->value;
-} else {
-  return default_value;
-}
-  } else {
-return default_value;
-  }
-}
-
 G_BEGIN_DECLS
 
 /**
@@ -545,177 +516,6 @@ garrow_array_to_string(GArrowArray *array, GError **error)
   }
 }
 
-/**
- * garrow_array_cast:
- * @array: A #GArrowArray.
- * @target_data_type: A #GArrowDataType of cast target data.
- * @options: (nullable): A #GArrowCastOptions.
- * @error: (nullable): Return location for a #GError or %NULL.
- *
- * Returns: (nullable) (transfer full):
- *   A newly created casted array on success, %NULL on error.
- *
- * Since: 0.7.0
- */
-GArrowArray *
-garrow_array_cast(GArrowArray *array,
-  GArrowDataType *target_data_type,
-  GArrowCastOptions *options,
-  GError **error)
-{
-  auto arrow_array = garrow_array_get_raw(array);
-  auto arrow_array_raw = arrow_array.get();
-  auto memory_pool = arrow::default_memory_pool();
-  arrow::compute::FunctionContext context(memory_pool);
-  auto arrow_target_data_type = garrow_data_type_get_raw(target_data_type);
-  std::shared_ptr arrow_casted_array;
-  arrow::Status status;
-  if (options) {
-auto arrow_options = garrow_cast_options_get_raw(options);
-status = arrow::compute::Cast(,
-  *arrow_array_raw,
-  arrow_target_data_type,
-  *arrow_options,
-  _casted_array);
-  } else {
-arrow::compute::CastOptions arrow_options;
-status = arrow::compute::Cast(,
-  *arrow_array_raw,
-  arrow_target_data_type,
-  arrow_options,
-  _casted_array);
-  }
-
-  if (!status.ok()) {
-std::stringstream message;
-message << "[array][cast] <";
-message << arrow_array->type()->ToString();
-message << "> -> <";
-message << arrow_target_data_type->ToString();
-message << ">";
-garrow_error_check(error, status, message.str().c_str());
-return NULL;
-  }
-
-  return garrow_array_new_raw(_casted_array);
-}
-
-/**
- * garrow_array_unique:
- * @array: A #GArrowArray.
- * @error: (nullable): Return location for a #GError or %NULL.
- *
- * Returns: (nullable) 

[arrow] branch master updated: ARROW-4339: [C++][Python] Developer documentation overhaul for 0.13 release

2019-03-17 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new d94a9fc  ARROW-4339: [C++][Python] Developer documentation overhaul 
for 0.13 release
d94a9fc is described below

commit d94a9fcee801d9e185f36f767bb5b70566df70ff
Author: Wes McKinney 
AuthorDate: Sun Mar 17 16:26:34 2019 -0500

ARROW-4339: [C++][Python] Developer documentation overhaul for 0.13 release

This was pretty much a huge pain but addresses accumulated documentation 
debt after the conda compiler migration and the CMake refactor. I suggest we 
not stress too much over small details on this and do more work to improve 
these docs in follow up PRs. I did the best I could under the circumstances and 
need to move on to other things now

I think the overall organization of the Sphinx project for developers is 
much improved, take a look (I will post a link to a published version for 
review)

JIRAs addressed by this PR and other things I did

* Update cpp/thirdparty/README.md given CMake refactor (this was totally 
out of date). This now directs users to the Sphinx C++ developer guide

* ARROW-4339: Move cpp/README.md to Sphinx documentation (and clean it up a 
lot!!)
* ARROW-4425: Move Contributing Guidelines from Confluence to Sphinx, 
update top level README
* ARROW-4232: Remove references to pre-gcc5 ABI issues
* ARROW-4165: Move Windows C++ developer guide to Sphinx (from 
cpp/apidoc/Windows.md)
* ARROW-4547: Update Python development instructions re: producing 
CUDA-enabled pyarrow
* ARROW-4326 / ARROW-3096: Update Python build instructions re: January 
2019 compiler migration

Author: Wes McKinney 

Closes #3942 from wesm/developer-docs-0.13 and squashes the following 
commits:

a3c3dd5de  Add some Boost info, misc cleaning
2ccc3de18  Remove index.md altogether
66da97e7f  Remove unused text from cpp/apidoc/index.md
504bc134e  restore 'what's in the arrow libraries' section
8d1f33e19  Finish initial documentation revamp for 0.13, 
stopping here
84dd680a2  Some docs reorg, begin rewriting cpp/README.md 
into docs/source/developers/cpp.rst
---
 README.md  |  38 +-
 ci/conda_env_cpp.yml   |   2 +-
 cpp/README.md  | 550 +
 cpp/apidoc/Windows.md  | 291 ---
 cpp/apidoc/index.md|  42 -
 cpp/thirdparty/README.md   |  90 +-
 docs/README.md |   2 +-
 docs/source/developers/contributing.rst|  88 ++
 docs/source/developers/cpp.rst | 913 +
 docs/source/developers/documentation.rst   |   2 +-
 docs/source/developers/index.rst   |   6 +-
 docs/source/developers/integration.rst |   2 +
 .../development.rst => developers/python.rst}  | 227 +++--
 docs/source/index.rst  |  30 +-
 docs/source/python/benchmarks.rst  |   2 +
 docs/source/python/index.rst   |   1 -
 docs/source/python/install.rst |   2 +-
 docs/source/python/parquet.rst |   6 +-
 python/README.md   |  49 +-
 19 files changed, 1194 insertions(+), 1149 deletions(-)

diff --git a/README.md b/README.md
index 621e119..24157b3 100644
--- a/README.md
+++ b/README.md
@@ -59,7 +59,7 @@ The reference Arrow libraries contain a number of distinct 
software components:
   library)
 - Reference-counted off-heap buffer memory management, for zero-copy memory
   sharing and handling memory-mapped files
-- Low-overhead IO interfaces to files on disk, HDFS (C++ only)
+- IO interfaces to local and remote filesystems
 - Self-describing binary wire formats (streaming and batch/file-like) for
   remote procedure calls (RPC) and
   interprocess communication (IPC)
@@ -67,6 +67,10 @@ The reference Arrow libraries contain a number of distinct 
software components:
   implementations (e.g. sending data from Java to C++)
 - Conversions to and from other in-memory data structures
 
+## How to Contribute
+
+Please read our latest [project contribution guide][5].
+
 ## Getting involved
 
 Even if you do not plan to contribute to Apache Arrow itself or Arrow
@@ -79,38 +83,8 @@ integrations in other projects, we'd be happy to have you 
involved:
 - [Learn the format][2]
 - Contribute code to one of the reference implementations
 
-## How to Contribute
-
-We prefer to receive contributions in the form of GitHub pull requests. Please
-send pull requests against the [github.com/apache/arrow][4] repository.
-
-If you are looking for some ideas on what to contribute, check out 

[arrow] branch master updated: ARROW-4931: [C++] CMake fails on gRPC ExternalProject

2019-03-17 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d73e0a  ARROW-4931: [C++] CMake fails on gRPC ExternalProject
9d73e0a is described below

commit 9d73e0a544d76382617f6f723a3ac5f8cff8e033
Author: Uwe L. Korn 
AuthorDate: Sun Mar 17 14:48:43 2019 -0500

ARROW-4931: [C++] CMake fails on gRPC ExternalProject

Author: Uwe L. Korn 

Closes #3943 from xhochy/ARROW-4931 and squashes the following commits:

aa24d57c9  ARROW-4931:  CMake fails on gRPC ExternalProject
---
 cpp/cmake_modules/ThirdpartyToolchain.cmake | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index dd66d00..5c23e50 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1979,6 +1979,7 @@ macro(build_grpc)
   add_dependencies(gRPC::grpc grpc_ep)
   add_dependencies(gRPC::grpc++ grpc_ep)
   add_dependencies(gRPC::address_sorting grpc_ep)
+  set(GRPC_VENDORED TRUE)
 endmacro()
 
 if(ARROW_WITH_GRPC)
@@ -2017,14 +2018,18 @@ if(ARROW_WITH_GRPC)
   get_target_property(GRPC_INCLUDE_DIR gRPC::grpc 
INTERFACE_INCLUDE_DIRECTORIES)
   include_directories(SYSTEM ${GRPC_INCLUDE_DIR})
 
-  # grpc++ headers may reside in ${GRPC_INCLUDE_DIR}/grpc++ or 
${GRPC_INCLUDE_DIR}/grpcpp
-  # depending on the gRPC version.
-  if(EXISTS "${GRPC_INCLUDE_DIR}/grpcpp/impl/codegen/config_protobuf.h")
+  if(GRPC_VENDORED)
 set(GRPCPP_PP_INCLUDE TRUE)
-  elseif(EXISTS "${GRPC_INCLUDE_DIR}/grpc++/impl/codegen/config_protobuf.h")
-set(GRPCPP_PP_INCLUDE FALSE)
   else()
-message(FATAL_ERROR "Cannot find grpc++ headers in ${GRPC_INCLUDE_DIR}")
+# grpc++ headers may reside in ${GRPC_INCLUDE_DIR}/grpc++ or 
${GRPC_INCLUDE_DIR}/grpcpp
+# depending on the gRPC version.
+if(EXISTS "${GRPC_INCLUDE_DIR}/grpcpp/impl/codegen/config_protobuf.h")
+  set(GRPCPP_PP_INCLUDE TRUE)
+elseif(EXISTS "${GRPC_INCLUDE_DIR}/grpc++/impl/codegen/config_protobuf.h")
+  set(GRPCPP_PP_INCLUDE FALSE)
+else()
+  message(FATAL_ERROR "Cannot find grpc++ headers in ${GRPC_INCLUDE_DIR}")
+endif()
   endif()
 endif()
 



[arrow] branch master updated: ARROW-4906: [Format] Write about SparseMatrixIndexCSR format is sorted

2019-03-17 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 066ee43  ARROW-4906: [Format] Write about SparseMatrixIndexCSR format 
is sorted
066ee43 is described below

commit 066ee43960b66a4ee0fe778fdc4a71d2c23d211b
Author: Kenta Murata 
AuthorDate: Sun Mar 17 11:03:37 2019 -0500

ARROW-4906: [Format] Write about SparseMatrixIndexCSR format is sorted

Currently, my implementation of SparseCSRIndex assumes indptr is sorted for 
each row.
So I want to note it in the format documentation just in case.

Author: Kenta Murata 

Closes #3929 from mrkn/fix_sparse_tensor_doc and squashes the following 
commits:

b851bb723  Write about SparseMatrixIndexCSR format is sorted
---
 format/SparseTensor.fbs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/format/SparseTensor.fbs b/format/SparseTensor.fbs
index 0a0c6c2..853dd19 100644
--- a/format/SparseTensor.fbs
+++ b/format/SparseTensor.fbs
@@ -49,7 +49,7 @@ table SparseTensorIndexCOO {
   ///[2, 2, 3, 1, 2, 0],
   ///[0, 1, 0, 0, 3, 4]]
   ///
-  /// Note that the indices are sorted in lexcographical order.
+  /// Note that the indices are sorted in lexicographical order.
   indicesBuffer: Buffer;
 }
 
@@ -86,6 +86,8 @@ table SparseMatrixIndexCSR {
   /// For example, the indices of the above X is:
   ///
   ///   indices(X) = [1, 2, 2, 1, 3, 0, 2, 3, 1].
+  ///
+  /// Note that the indices are sorted in lexicographical order for each row.
   indicesBuffer: Buffer;
 }
 



[arrow] branch master updated: [Docker][C++] Remove duplicated ARROW_GANDIVA line from docker_build_cpp.sh

2019-03-17 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new d95208f  [Docker][C++] Remove duplicated ARROW_GANDIVA line from 
docker_build_cpp.sh
d95208f is described below

commit d95208f336a7b29ed4616b3b6c339ef816f4cbd3
Author: Suvayu Ali 
AuthorDate: Sun Mar 17 16:02:56 2019 +

[Docker][C++] Remove duplicated ARROW_GANDIVA line from docker_build_cpp.sh
---
 ci/docker_build_cpp.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/ci/docker_build_cpp.sh b/ci/docker_build_cpp.sh
index 78e14b5..6e780b6 100755
--- a/ci/docker_build_cpp.sh
+++ b/ci/docker_build_cpp.sh
@@ -36,7 +36,6 @@ cmake -GNinja \
   -DCMAKE_INSTALL_LIBDIR=lib \
   -DARROW_WITH_BZ2=${ARROW_WITH_BZ2:-ON} \
   -DARROW_WITH_ZSTD=${ARROW_WITH_ZSTD:-ON} \
-  -DARROW_GANDIVA=${ARROW_GANDIVA:-ON} \
   -DARROW_BUILD_BENCHMARKS=${ARROW_BUILD_BENCHMARKS:-ON} \
   -DARROW_FLIGHT=${ARROW_FLIGHT:-ON} \
   -DARROW_ORC=${ARROW_ORC:-ON} \



[arrow] branch master updated: ARROW-4933: [R] Autodetect Parquet support using pkg-config

2019-03-17 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 974b723  ARROW-4933: [R] Autodetect Parquet support using pkg-config
974b723 is described below

commit 974b7232bf2920c4a43af685964a005f40dce456
Author: Uwe L. Korn 
AuthorDate: Sun Mar 17 10:11:16 2019 -0500

ARROW-4933: [R] Autodetect Parquet support using pkg-config

Kudos go to @kou for this.

Author: Uwe L. Korn 

Closes #3946 from xhochy/ARROW-4933 and squashes the following commits:

abc7d4083  ARROW-4933:  Autodetect Parquet support using 
pkg-config
---
 r/configure | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/r/configure b/r/configure
index 19f4d2c..f0c6d49 100755
--- a/r/configure
+++ b/r/configure
@@ -32,13 +32,26 @@ PKG_RPM_NAME="arrow"
 PKG_CSW_NAME="arrow"
 PKG_BREW_NAME="apache-arrow"
 PKG_TEST_HEADER=""
-PKG_LIBS="-larrow -lparquet"
+PKG_LIBS=""
 
 # Use pkg-config if available
 pkg-config --version >/dev/null 2>&1
 if [ $? -eq 0 ]; then
   PKGCONFIG_CFLAGS=`pkg-config --cflags --silence-errors ${PKG_CONFIG_NAME}`
   PKGCONFIG_LIBS=`pkg-config --libs ${PKG_CONFIG_NAME}`
+  PKGCONFIG_CFLAGS=$(pkg-config --cflags arrow)
+  if [ $? -ne 0 ]; then
+echo "Apache Arrow C++ was not found using pkg-config"
+exit 1
+  fi
+  PKGCONFIG_LIBS=$(pkg-config --libs arrow)
+  PKGCONFIG_CFLAGS_PARQUET=$(pkg-config --cflags parquet)
+  if [ $? -eq 0 ]; then
+PKGCONFIG_CFLAGS="${PKGCONFIG_CFLAGS} ${PKGCONFIG_CFLAGS_PARQUET} 
-DARROW_R_WITH_PARQUET"
+PKGCONFIG_LIBS="${PKGCONFIG_LIBS} $(pkg-config --libs parquet)"
+  fi
+else
+  PKG_LIBS="-larrow -lparquet"
 fi
 
 # Note that cflags may be empty in case of success



[arrow] branch master updated: ARROW-4915: [GLib][C++] Add arrow::NullBuilder support for GLib

2019-03-17 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d2280f  ARROW-4915: [GLib][C++] Add arrow::NullBuilder support for 
GLib
9d2280f is described below

commit 9d2280fb9093580fc8073e972bbae3095b75203c
Author: Kenta Murata 
AuthorDate: Sun Mar 17 17:55:02 2019 +0900

ARROW-4915: [GLib][C++] Add arrow::NullBuilder support for GLib

This pull request add two things:

1. `arrow::NullBuilder::AppendNulls()` function
2. `GArrowNullArrayBuilder` class

Author: Kenta Murata 
Author: Kouhei Sutou 

Closes #3938 from mrkn/glib_null_builder and squashes the following commits:

e53b004e   Accept NullArray.new(n)
17c7c86c   Add overflow check in NullBuilder::AppendNull()
b31315a2   Add and fix version tags
47af12da   Rewrite with G_DECLARE_DERIVABLE_TYPE
8ab32559   Put NullArrayBuilder tests in test-array-builder.rb
1cdb500d   Remove needless TODO comment
6713a13c   Check overflow in NullBuilder::AppendNulls()
544fc6eb   Add GArrowNullArrayBuilder
2038ce23   Add NullBuilder::AppendNulls() function
---
 c_glib/arrow-glib/array-builder.cpp| 81 ++
 c_glib/arrow-glib/array-builder.h  | 23 
 c_glib/test/helper/buildable.rb|  4 ++
 c_glib/test/test-array-builder.rb  | 33 +++
 cpp/src/arrow/array-test.cc|  5 +-
 cpp/src/arrow/array/builder_primitive.h| 11 
 ruby/red-arrow/lib/arrow/array-builder.rb  |  4 ++
 ruby/red-arrow/lib/arrow/array.rb  |  2 +-
 ruby/red-arrow/lib/arrow/loader.rb |  1 +
 ruby/red-arrow/lib/arrow/null-array-builder.rb | 26 +
 10 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/c_glib/arrow-glib/array-builder.cpp 
b/c_glib/arrow-glib/array-builder.cpp
index afdae8c..b9a9e71 100644
--- a/c_glib/arrow-glib/array-builder.cpp
+++ b/c_glib/arrow-glib/array-builder.cpp
@@ -153,6 +153,9 @@ G_BEGIN_DECLS
  *
  * You need to use array builder class to create a new array.
  *
+ * #GArrowNullArrayBuilder is the class to create a new
+ * #GArrowNullArray.
+ *
  * #GArrowBooleanArrayBuilder is the class to create a new
  * #GArrowBooleanArray.
  *
@@ -409,6 +412,81 @@ garrow_array_builder_finish(GArrowArrayBuilder *builder, 
GError **error)
 }
 
 
+G_DEFINE_TYPE(GArrowNullArrayBuilder,
+  garrow_null_array_builder,
+  GARROW_TYPE_ARRAY_BUILDER)
+
+static void
+garrow_null_array_builder_init(GArrowNullArrayBuilder *builder)
+{
+}
+
+static void
+garrow_null_array_builder_class_init(GArrowNullArrayBuilderClass *klass)
+{
+}
+
+/**
+ * garrow_null_array_builder_new:
+ *
+ * Returns: A newly created #GArrowNullArrayBuilder.
+ *
+ * Since: 0.13.0
+ */
+GArrowNullArrayBuilder *
+garrow_null_array_builder_new(void)
+{
+  auto builder = garrow_array_builder_new(arrow::null(),
+  NULL,
+  "[null-array-builder][new]");
+  return GARROW_NULL_ARRAY_BUILDER(builder);
+}
+
+/**
+ * garrow_null_array_builder_append_null:
+ * @builder: A #GArrowNullArrayBuilder.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Returns: %TRUE on success, %FALSE if there was an error.
+ *
+ * Since: 0.13.0
+ */
+gboolean
+garrow_null_array_builder_append_null(GArrowNullArrayBuilder *builder,
+  GError **error)
+{
+  return garrow_array_builder_append_null
+(GARROW_ARRAY_BUILDER(builder),
+ error,
+ "[null-array-builder][append-null]");
+}
+
+/**
+ * garrow_null_array_builder_append_nulls:
+ * @builder: A #GArrowNullArrayBuilder.
+ * @n: The number of null values to be appended.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Append multiple nulls at once. It's more efficient than multiple
+ * `append_null()` calls.
+ *
+ * Returns: %TRUE on success, %FALSE if there was an error.
+ *
+ * Since: 0.13.0
+ */
+gboolean
+garrow_null_array_builder_append_nulls(GArrowNullArrayBuilder *builder,
+   gint64 n,
+   GError **error)
+{
+  return garrow_array_builder_append_nulls
+(GARROW_ARRAY_BUILDER(builder),
+ n,
+ error,
+ "[null-array-builder][append-nulls]");
+}
+
+
 G_DEFINE_TYPE(GArrowBooleanArrayBuilder,
   garrow_boolean_array_builder,
   GARROW_TYPE_ARRAY_BUILDER)
@@ -3890,6 +3968,9 @@ garrow_array_builder_new_raw(arrow::ArrayBuilder 
*arrow_builder,
 {
   if (type == G_TYPE_INVALID) {
 switch (arrow_builder->type()->id()) {
+case arrow::Type::type::NA:
+  type = GARROW_TYPE_NULL_ARRAY_BUILDER;
+  break;
 case arrow::Type::type::BOOL:
   type = GARROW_TYPE_BOOLEAN_ARRAY_BUILDER;