[arrow] branch master updated: ARROW-3769: [C++] Add support for reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new fd0b90a ARROW-3769: [C++] Add support for reading non-dictionary encoded binary Parquet columns directly as DictionaryArray fd0b90a is described below commit fd0b90a7f7e65fde32af04c4746004a1240914cf Author: Hatem Helal AuthorDate: Sun Mar 17 19:13:41 2019 -0500 ARROW-3769: [C++] Add support for reading non-dictionary encoded binary Parquet columns directly as DictionaryArray This patch addresses the following JIRAS: * [ARROW-3769](https://issues.apache.org/jira/browse/ARROW-3769): refactored record reader logic to toggle between the different builder depending on the column type (String or Binary) and the requested array type (Chunked "dense" or Dictionary). These changes are covered by unittests and benchmarks. * [PARQUET-1537](https://issues.apache.org/jira/browse/PARQUET-1537): fixed increment and covered by unittests. Also included is an experimental class `ArrowReaderProperties` that can be used to select which columns are read directly as an `arrow::DictionaryArray`. I think some more work is needed to fully address the requests in [ARROW-3772](https://issues.apache.org/jira/browse/ARROW-3772). Namely, the ability automatically infer which columns in a parquet file should be read as `DictionaryArray`. My current thinking is that this would be solved by introducing optional arrow type metadata [...] Note that the behavior with this patch is that incremental reading of a parquet file will not resolve the global dictionary for all of the row groups. There are a few possible solutions for this: * Introduce a concept of an "unknown" dictionary. This will enable concatenating multiple row groups together so long as we define unknown dictionaries as equal (assuming indices have the same data type) * Add an API for merging the schemas from multiple tables together. This could be used after reading multiple row groups to enable concatenating the tables together into one. * Add an API for inferring the global dictionary for the entire file. This could be an expensive operation so ideally would be made optional. * Allow a user-specified dictionary. This could be useful in the limited case where a caller already knows the global dictionary list (computed through some other mechanism). Author: Hatem Helal Author: Hatem Helal Author: Hatem Helal Closes #3721 from hatemhelal/arrow-3769 and squashes the following commits: f644fff9c Move schema fix logic to post-processing step 023c022c3 Add virtual destructor to WrappedBuilderInterface 99e9dee12 Removed dependencies on arrow builder in parquet/encoding 2026b513c Rework ByteArrayDecoder interface to reduce code duplication 5bc933b97 use PutSpaced in test setup to correctly initialize encoded data 2c8fa7efd revert incorrect changes to PlainByteArrayDecoder::DecodeArrow method 7719b944f Use random string generator instead of poor JSON e6ca0db43 Fix DictEncoding test: need to use PutSpaced instead of Put in setup 9da133142 Temporarily disable tests for arrow builder decoding from dictionary encoded col 7347cfa26 Fix DecodeArrow from plain encoded columns 5fb9e860a Rework parquet encoding tests 4d7bb30de Refactor dictionary data generation into RandomArrayGenerator 6e65fdbdf simplify ArrowReaderProperties and mark as experimental babe52e38 replace deprecated ReadableFileInterface with RandomAccessFile a267a27d4 remove unnecessary inlines 7aac84c45 Reworked encoding benchmark to reduce code duplication 077a8f1ae Move function definition to (hopefully) resolve appveyor build failure due to C2491 a35754456 Basic unittests for reading DictionaryArray directly from parquet a6740f31e Make sure to update the schema when reading a column as a DictionaryArray a8c15354e Add support for requesting a parquet column be read as a DictionaryArray 28d76b7b2 Add benchmark for dictionary decoding using arrow builder 8f59198e8 Add overloads for decoding using a StringDictionaryBuilder b16eaa978 prefer default_random_engine to avoid potential slowdown with Mersenne Twister prng ff380211c prefer mersenne twister prng over default one which is implemenation defined 78eddb8af Use value parameterization in decoding tests 84df23bfa prefer range-based for loop to KeepRunning while loop pattern f234ca2a2 respond to code review feedback - many readability fixes in benchmark and tests 4fbcf1fab fix loop increment in templated PlainByteArrayDecoder::DecodeArrow method 39a5f1994 fix appveyor windows failure 89de5d5be rework data generation so that decoding
[arrow] branch master updated: ARROW-4937: [R] Clean pkg-config related logic
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new a530848 ARROW-4937: [R] Clean pkg-config related logic a530848 is described below commit a530848605c1d0249d659b81a7b794e2c6755c64 Author: Kouhei Sutou AuthorDate: Mon Mar 18 09:04:57 2019 +0900 ARROW-4937: [R] Clean pkg-config related logic * Remove unused codes * Hide error messages from pkg-config (We report our error messages) Author: Kouhei Sutou Closes #3951 from kou/r-configure-pkg-config-clean and squashes the following commits: e5383fd8 Clean pkg-config related logic --- r/configure | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/r/configure b/r/configure index f0c6d49..a3bc690 100755 --- a/r/configure +++ b/r/configure @@ -37,15 +37,13 @@ PKG_LIBS="" # Use pkg-config if available pkg-config --version >/dev/null 2>&1 if [ $? -eq 0 ]; then - PKGCONFIG_CFLAGS=`pkg-config --cflags --silence-errors ${PKG_CONFIG_NAME}` - PKGCONFIG_LIBS=`pkg-config --libs ${PKG_CONFIG_NAME}` - PKGCONFIG_CFLAGS=$(pkg-config --cflags arrow) + PKGCONFIG_CFLAGS=$(pkg-config --cflags --silence-errors arrow) if [ $? -ne 0 ]; then echo "Apache Arrow C++ was not found using pkg-config" exit 1 fi PKGCONFIG_LIBS=$(pkg-config --libs arrow) - PKGCONFIG_CFLAGS_PARQUET=$(pkg-config --cflags parquet) + PKGCONFIG_CFLAGS_PARQUET=$(pkg-config --cflags --silence-errors parquet) if [ $? -eq 0 ]; then PKGCONFIG_CFLAGS="${PKGCONFIG_CFLAGS} ${PKGCONFIG_CFLAGS_PARQUET} -DARROW_R_WITH_PARQUET" PKGCONFIG_LIBS="${PKGCONFIG_LIBS} $(pkg-config --libs parquet)"
[arrow] branch master updated: ARROW-4932: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 2f740ac ARROW-4932: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro 2f740ac is described below commit 2f740ac8840cd527caeca83ed19953decfc32e12 Author: Kenta Murata AuthorDate: Mon Mar 18 09:03:37 2019 +0900 ARROW-4932: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro Author: Kenta Murata Closes #3945 from mrkn/glib_use_g_declare_derivable_type and squashes the following commits: 4067d10a Fix the parent class of GArrowStringArrayBuilder eb122593 Use G_DECLARE_DERIVABLE_TYPE --- c_glib/arrow-glib/array-builder.h | 864 +--- c_glib/arrow-glib/basic-array.h | 172 +-- c_glib/arrow-glib/basic-data-type.h | 375 ++ c_glib/arrow-glib/chunked-array.h | 43 +- c_glib/arrow-glib/composite-array.h | 43 +- c_glib/arrow-glib/composite-data-type.h | 42 +- c_glib/arrow-glib/field.h | 43 +- c_glib/arrow-glib/record-batch.h| 43 +- c_glib/arrow-glib/tensor.h | 35 +- 9 files changed, 226 insertions(+), 1434 deletions(-) diff --git a/c_glib/arrow-glib/array-builder.h b/c_glib/arrow-glib/array-builder.h index 9fcadbd..075f080 100644 --- a/c_glib/arrow-glib/array-builder.h +++ b/c_glib/arrow-glib/array-builder.h @@ -70,46 +70,16 @@ gboolean garrow_null_array_builder_append_nulls(GArrowNullArrayBuilder *builder, #define GARROW_TYPE_BOOLEAN_ARRAY_BUILDER \ (garrow_boolean_array_builder_get_type()) -#define GARROW_BOOLEAN_ARRAY_BUILDER(obj) \ - (G_TYPE_CHECK_INSTANCE_CAST((obj),\ - GARROW_TYPE_BOOLEAN_ARRAY_BUILDER,\ - GArrowBooleanArrayBuilder)) -#define GARROW_BOOLEAN_ARRAY_BUILDER_CLASS(klass) \ - (G_TYPE_CHECK_CLASS_CAST((klass), \ - GARROW_TYPE_BOOLEAN_ARRAY_BUILDER, \ - GArrowBooleanArrayBuilderClass)) -#define GARROW_IS_BOOLEAN_ARRAY_BUILDER(obj)\ - (G_TYPE_CHECK_INSTANCE_TYPE((obj),\ - GARROW_TYPE_BOOLEAN_ARRAY_BUILDER)) -#define GARROW_IS_BOOLEAN_ARRAY_BUILDER_CLASS(klass)\ - (G_TYPE_CHECK_CLASS_TYPE((klass), \ - GARROW_TYPE_BOOLEAN_ARRAY_BUILDER)) -#define GARROW_BOOLEAN_ARRAY_BUILDER_GET_CLASS(obj) \ - (G_TYPE_INSTANCE_GET_CLASS((obj), \ - GARROW_TYPE_BOOLEAN_ARRAY_BUILDER, \ - GArrowBooleanArrayBuilderClass)) - -typedef struct _GArrowBooleanArrayBuilder GArrowBooleanArrayBuilder; -typedef struct _GArrowBooleanArrayBuilderClass GArrowBooleanArrayBuilderClass; - -/** - * GArrowBooleanArrayBuilder: - * - * It wraps `arrow::BooleanBuilder`. - */ -struct _GArrowBooleanArrayBuilder -{ - /*< private >*/ - GArrowArrayBuilder parent_instance; -}; - +G_DECLARE_DERIVABLE_TYPE(GArrowBooleanArrayBuilder, + garrow_boolean_array_builder, + GARROW, + BOOLEAN_ARRAY_BUILDER, + GArrowArrayBuilder) struct _GArrowBooleanArrayBuilderClass { GArrowArrayBuilderClass parent_class; }; -GType garrow_boolean_array_builder_get_type(void) G_GNUC_CONST; - GArrowBooleanArrayBuilder *garrow_boolean_array_builder_new(void); #ifndef GARROW_DISABLE_DEPRECATED @@ -135,48 +105,17 @@ gboolean garrow_boolean_array_builder_append_nulls(GArrowBooleanArrayBuilder *bu GError **error); -#define GARROW_TYPE_INT_ARRAY_BUILDER \ - (garrow_int_array_builder_get_type()) -#define GARROW_INT_ARRAY_BUILDER(obj) \ - (G_TYPE_CHECK_INSTANCE_CAST((obj),\ - GARROW_TYPE_INT_ARRAY_BUILDER,\ - GArrowIntArrayBuilder)) -#define GARROW_INT_ARRAY_BUILDER_CLASS(klass) \ - (G_TYPE_CHECK_CLASS_CAST((klass), \ - GARROW_TYPE_INT_ARRAY_BUILDER, \ - GArrowIntArrayBuilderClass)) -#define GARROW_IS_INT_ARRAY_BUILDER(obj)\ - (G_TYPE_CHECK_INSTANCE_TYPE((obj),\ - GARROW_TYPE_INT_ARRAY_BUILDER)) -#define GARROW_IS_INT_ARRAY_BUILDER_CLASS(klass)\ - (G_TYPE_CHECK_CLASS_TYPE((klass), \ - GARROW_TYPE_INT_ARRAY_BUILDER)) -#define
[arrow] branch master updated: ARROW-4929: [GLib] Add garrow_array_count_values()
This is an automated email from the ASF dual-hosted git repository. shiro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 201a3bc ARROW-4929: [GLib] Add garrow_array_count_values() 201a3bc is described below commit 201a3bc9186aecd3b22529d97af30ac1fab25a3a Author: Kouhei Sutou AuthorDate: Mon Mar 18 08:39:42 2019 +0900 ARROW-4929: [GLib] Add garrow_array_count_values() Author: Kouhei Sutou Closes #3941 from kou/glib-count-values and squashes the following commits: f9e3bc51 Don't use special characters in HTML 321fe28a Move compute related code to compute.{cpp,h} c8bd73bc Add missing (nullable) attribute 7ff43645 Fix a typo 18b8d89e Fix markup 95c08075 Add garrow_array_count_values() --- c_glib/arrow-glib/basic-array.cpp | 570 c_glib/arrow-glib/basic-array.h | 66 c_glib/arrow-glib/composite-array.h | 43 +-- c_glib/arrow-glib/compute.cpp | 612 +- c_glib/arrow-glib/compute.h | 71 +++- c_glib/doc/arrow-glib/arrow-glib-docs.xml | 5 +- c_glib/gandiva-glib/node.cpp | 2 +- c_glib/test/test-count-values.rb | 51 +++ 8 files changed, 738 insertions(+), 682 deletions(-) diff --git a/c_glib/arrow-glib/basic-array.cpp b/c_glib/arrow-glib/basic-array.cpp index 8f27e26..b051c97 100644 --- a/c_glib/arrow-glib/basic-array.cpp +++ b/c_glib/arrow-glib/basic-array.cpp @@ -24,7 +24,6 @@ #include #include #include -#include #include #include #include @@ -83,34 +82,6 @@ garrow_primitive_array_new(GArrowDataType *data_type, return garrow_array_new_raw(_array); }; -template -typename ArrowType::c_type -garrow_numeric_array_sum(GArrowArrayType array, - GError **error, - const gchar *tag, - typename ArrowType::c_type default_value) -{ - auto arrow_array = garrow_array_get_raw(GARROW_ARRAY(array)); - auto memory_pool = arrow::default_memory_pool(); - arrow::compute::FunctionContext context(memory_pool); - arrow::compute::Datum sum_datum; - auto status = arrow::compute::Sum(, -arrow_array, -_datum); - if (garrow_error_check(error, status, tag)) { -using ScalarType = typename arrow::TypeTraits::ScalarType; -auto arrow_numeric_scalar = - std::dynamic_pointer_cast(sum_datum.scalar()); -if (arrow_numeric_scalar->is_valid) { - return arrow_numeric_scalar->value; -} else { - return default_value; -} - } else { -return default_value; - } -} - G_BEGIN_DECLS /** @@ -545,177 +516,6 @@ garrow_array_to_string(GArrowArray *array, GError **error) } } -/** - * garrow_array_cast: - * @array: A #GArrowArray. - * @target_data_type: A #GArrowDataType of cast target data. - * @options: (nullable): A #GArrowCastOptions. - * @error: (nullable): Return location for a #GError or %NULL. - * - * Returns: (nullable) (transfer full): - * A newly created casted array on success, %NULL on error. - * - * Since: 0.7.0 - */ -GArrowArray * -garrow_array_cast(GArrowArray *array, - GArrowDataType *target_data_type, - GArrowCastOptions *options, - GError **error) -{ - auto arrow_array = garrow_array_get_raw(array); - auto arrow_array_raw = arrow_array.get(); - auto memory_pool = arrow::default_memory_pool(); - arrow::compute::FunctionContext context(memory_pool); - auto arrow_target_data_type = garrow_data_type_get_raw(target_data_type); - std::shared_ptr arrow_casted_array; - arrow::Status status; - if (options) { -auto arrow_options = garrow_cast_options_get_raw(options); -status = arrow::compute::Cast(, - *arrow_array_raw, - arrow_target_data_type, - *arrow_options, - _casted_array); - } else { -arrow::compute::CastOptions arrow_options; -status = arrow::compute::Cast(, - *arrow_array_raw, - arrow_target_data_type, - arrow_options, - _casted_array); - } - - if (!status.ok()) { -std::stringstream message; -message << "[array][cast] <"; -message << arrow_array->type()->ToString(); -message << "> -> <"; -message << arrow_target_data_type->ToString(); -message << ">"; -garrow_error_check(error, status, message.str().c_str()); -return NULL; - } - - return garrow_array_new_raw(_casted_array); -} - -/** - * garrow_array_unique: - * @array: A #GArrowArray. - * @error: (nullable): Return location for a #GError or %NULL. - * - * Returns: (nullable)
[arrow] branch master updated: ARROW-4339: [C++][Python] Developer documentation overhaul for 0.13 release
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new d94a9fc ARROW-4339: [C++][Python] Developer documentation overhaul for 0.13 release d94a9fc is described below commit d94a9fcee801d9e185f36f767bb5b70566df70ff Author: Wes McKinney AuthorDate: Sun Mar 17 16:26:34 2019 -0500 ARROW-4339: [C++][Python] Developer documentation overhaul for 0.13 release This was pretty much a huge pain but addresses accumulated documentation debt after the conda compiler migration and the CMake refactor. I suggest we not stress too much over small details on this and do more work to improve these docs in follow up PRs. I did the best I could under the circumstances and need to move on to other things now I think the overall organization of the Sphinx project for developers is much improved, take a look (I will post a link to a published version for review) JIRAs addressed by this PR and other things I did * Update cpp/thirdparty/README.md given CMake refactor (this was totally out of date). This now directs users to the Sphinx C++ developer guide * ARROW-4339: Move cpp/README.md to Sphinx documentation (and clean it up a lot!!) * ARROW-4425: Move Contributing Guidelines from Confluence to Sphinx, update top level README * ARROW-4232: Remove references to pre-gcc5 ABI issues * ARROW-4165: Move Windows C++ developer guide to Sphinx (from cpp/apidoc/Windows.md) * ARROW-4547: Update Python development instructions re: producing CUDA-enabled pyarrow * ARROW-4326 / ARROW-3096: Update Python build instructions re: January 2019 compiler migration Author: Wes McKinney Closes #3942 from wesm/developer-docs-0.13 and squashes the following commits: a3c3dd5de Add some Boost info, misc cleaning 2ccc3de18 Remove index.md altogether 66da97e7f Remove unused text from cpp/apidoc/index.md 504bc134e restore 'what's in the arrow libraries' section 8d1f33e19 Finish initial documentation revamp for 0.13, stopping here 84dd680a2 Some docs reorg, begin rewriting cpp/README.md into docs/source/developers/cpp.rst --- README.md | 38 +- ci/conda_env_cpp.yml | 2 +- cpp/README.md | 550 + cpp/apidoc/Windows.md | 291 --- cpp/apidoc/index.md| 42 - cpp/thirdparty/README.md | 90 +- docs/README.md | 2 +- docs/source/developers/contributing.rst| 88 ++ docs/source/developers/cpp.rst | 913 + docs/source/developers/documentation.rst | 2 +- docs/source/developers/index.rst | 6 +- docs/source/developers/integration.rst | 2 + .../development.rst => developers/python.rst} | 227 +++-- docs/source/index.rst | 30 +- docs/source/python/benchmarks.rst | 2 + docs/source/python/index.rst | 1 - docs/source/python/install.rst | 2 +- docs/source/python/parquet.rst | 6 +- python/README.md | 49 +- 19 files changed, 1194 insertions(+), 1149 deletions(-) diff --git a/README.md b/README.md index 621e119..24157b3 100644 --- a/README.md +++ b/README.md @@ -59,7 +59,7 @@ The reference Arrow libraries contain a number of distinct software components: library) - Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files -- Low-overhead IO interfaces to files on disk, HDFS (C++ only) +- IO interfaces to local and remote filesystems - Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC) @@ -67,6 +67,10 @@ The reference Arrow libraries contain a number of distinct software components: implementations (e.g. sending data from Java to C++) - Conversions to and from other in-memory data structures +## How to Contribute + +Please read our latest [project contribution guide][5]. + ## Getting involved Even if you do not plan to contribute to Apache Arrow itself or Arrow @@ -79,38 +83,8 @@ integrations in other projects, we'd be happy to have you involved: - [Learn the format][2] - Contribute code to one of the reference implementations -## How to Contribute - -We prefer to receive contributions in the form of GitHub pull requests. Please -send pull requests against the [github.com/apache/arrow][4] repository. - -If you are looking for some ideas on what to contribute, check out
[arrow] branch master updated: ARROW-4931: [C++] CMake fails on gRPC ExternalProject
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 9d73e0a ARROW-4931: [C++] CMake fails on gRPC ExternalProject 9d73e0a is described below commit 9d73e0a544d76382617f6f723a3ac5f8cff8e033 Author: Uwe L. Korn AuthorDate: Sun Mar 17 14:48:43 2019 -0500 ARROW-4931: [C++] CMake fails on gRPC ExternalProject Author: Uwe L. Korn Closes #3943 from xhochy/ARROW-4931 and squashes the following commits: aa24d57c9 ARROW-4931: CMake fails on gRPC ExternalProject --- cpp/cmake_modules/ThirdpartyToolchain.cmake | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index dd66d00..5c23e50 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -1979,6 +1979,7 @@ macro(build_grpc) add_dependencies(gRPC::grpc grpc_ep) add_dependencies(gRPC::grpc++ grpc_ep) add_dependencies(gRPC::address_sorting grpc_ep) + set(GRPC_VENDORED TRUE) endmacro() if(ARROW_WITH_GRPC) @@ -2017,14 +2018,18 @@ if(ARROW_WITH_GRPC) get_target_property(GRPC_INCLUDE_DIR gRPC::grpc INTERFACE_INCLUDE_DIRECTORIES) include_directories(SYSTEM ${GRPC_INCLUDE_DIR}) - # grpc++ headers may reside in ${GRPC_INCLUDE_DIR}/grpc++ or ${GRPC_INCLUDE_DIR}/grpcpp - # depending on the gRPC version. - if(EXISTS "${GRPC_INCLUDE_DIR}/grpcpp/impl/codegen/config_protobuf.h") + if(GRPC_VENDORED) set(GRPCPP_PP_INCLUDE TRUE) - elseif(EXISTS "${GRPC_INCLUDE_DIR}/grpc++/impl/codegen/config_protobuf.h") -set(GRPCPP_PP_INCLUDE FALSE) else() -message(FATAL_ERROR "Cannot find grpc++ headers in ${GRPC_INCLUDE_DIR}") +# grpc++ headers may reside in ${GRPC_INCLUDE_DIR}/grpc++ or ${GRPC_INCLUDE_DIR}/grpcpp +# depending on the gRPC version. +if(EXISTS "${GRPC_INCLUDE_DIR}/grpcpp/impl/codegen/config_protobuf.h") + set(GRPCPP_PP_INCLUDE TRUE) +elseif(EXISTS "${GRPC_INCLUDE_DIR}/grpc++/impl/codegen/config_protobuf.h") + set(GRPCPP_PP_INCLUDE FALSE) +else() + message(FATAL_ERROR "Cannot find grpc++ headers in ${GRPC_INCLUDE_DIR}") +endif() endif() endif()
[arrow] branch master updated: ARROW-4906: [Format] Write about SparseMatrixIndexCSR format is sorted
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 066ee43 ARROW-4906: [Format] Write about SparseMatrixIndexCSR format is sorted 066ee43 is described below commit 066ee43960b66a4ee0fe778fdc4a71d2c23d211b Author: Kenta Murata AuthorDate: Sun Mar 17 11:03:37 2019 -0500 ARROW-4906: [Format] Write about SparseMatrixIndexCSR format is sorted Currently, my implementation of SparseCSRIndex assumes indptr is sorted for each row. So I want to note it in the format documentation just in case. Author: Kenta Murata Closes #3929 from mrkn/fix_sparse_tensor_doc and squashes the following commits: b851bb723 Write about SparseMatrixIndexCSR format is sorted --- format/SparseTensor.fbs | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/format/SparseTensor.fbs b/format/SparseTensor.fbs index 0a0c6c2..853dd19 100644 --- a/format/SparseTensor.fbs +++ b/format/SparseTensor.fbs @@ -49,7 +49,7 @@ table SparseTensorIndexCOO { ///[2, 2, 3, 1, 2, 0], ///[0, 1, 0, 0, 3, 4]] /// - /// Note that the indices are sorted in lexcographical order. + /// Note that the indices are sorted in lexicographical order. indicesBuffer: Buffer; } @@ -86,6 +86,8 @@ table SparseMatrixIndexCSR { /// For example, the indices of the above X is: /// /// indices(X) = [1, 2, 2, 1, 3, 0, 2, 3, 1]. + /// + /// Note that the indices are sorted in lexicographical order for each row. indicesBuffer: Buffer; }
[arrow] branch master updated: [Docker][C++] Remove duplicated ARROW_GANDIVA line from docker_build_cpp.sh
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new d95208f [Docker][C++] Remove duplicated ARROW_GANDIVA line from docker_build_cpp.sh d95208f is described below commit d95208f336a7b29ed4616b3b6c339ef816f4cbd3 Author: Suvayu Ali AuthorDate: Sun Mar 17 16:02:56 2019 + [Docker][C++] Remove duplicated ARROW_GANDIVA line from docker_build_cpp.sh --- ci/docker_build_cpp.sh | 1 - 1 file changed, 1 deletion(-) diff --git a/ci/docker_build_cpp.sh b/ci/docker_build_cpp.sh index 78e14b5..6e780b6 100755 --- a/ci/docker_build_cpp.sh +++ b/ci/docker_build_cpp.sh @@ -36,7 +36,6 @@ cmake -GNinja \ -DCMAKE_INSTALL_LIBDIR=lib \ -DARROW_WITH_BZ2=${ARROW_WITH_BZ2:-ON} \ -DARROW_WITH_ZSTD=${ARROW_WITH_ZSTD:-ON} \ - -DARROW_GANDIVA=${ARROW_GANDIVA:-ON} \ -DARROW_BUILD_BENCHMARKS=${ARROW_BUILD_BENCHMARKS:-ON} \ -DARROW_FLIGHT=${ARROW_FLIGHT:-ON} \ -DARROW_ORC=${ARROW_ORC:-ON} \
[arrow] branch master updated: ARROW-4933: [R] Autodetect Parquet support using pkg-config
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 974b723 ARROW-4933: [R] Autodetect Parquet support using pkg-config 974b723 is described below commit 974b7232bf2920c4a43af685964a005f40dce456 Author: Uwe L. Korn AuthorDate: Sun Mar 17 10:11:16 2019 -0500 ARROW-4933: [R] Autodetect Parquet support using pkg-config Kudos go to @kou for this. Author: Uwe L. Korn Closes #3946 from xhochy/ARROW-4933 and squashes the following commits: abc7d4083 ARROW-4933: Autodetect Parquet support using pkg-config --- r/configure | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/r/configure b/r/configure index 19f4d2c..f0c6d49 100755 --- a/r/configure +++ b/r/configure @@ -32,13 +32,26 @@ PKG_RPM_NAME="arrow" PKG_CSW_NAME="arrow" PKG_BREW_NAME="apache-arrow" PKG_TEST_HEADER="" -PKG_LIBS="-larrow -lparquet" +PKG_LIBS="" # Use pkg-config if available pkg-config --version >/dev/null 2>&1 if [ $? -eq 0 ]; then PKGCONFIG_CFLAGS=`pkg-config --cflags --silence-errors ${PKG_CONFIG_NAME}` PKGCONFIG_LIBS=`pkg-config --libs ${PKG_CONFIG_NAME}` + PKGCONFIG_CFLAGS=$(pkg-config --cflags arrow) + if [ $? -ne 0 ]; then +echo "Apache Arrow C++ was not found using pkg-config" +exit 1 + fi + PKGCONFIG_LIBS=$(pkg-config --libs arrow) + PKGCONFIG_CFLAGS_PARQUET=$(pkg-config --cflags parquet) + if [ $? -eq 0 ]; then +PKGCONFIG_CFLAGS="${PKGCONFIG_CFLAGS} ${PKGCONFIG_CFLAGS_PARQUET} -DARROW_R_WITH_PARQUET" +PKGCONFIG_LIBS="${PKGCONFIG_LIBS} $(pkg-config --libs parquet)" + fi +else + PKG_LIBS="-larrow -lparquet" fi # Note that cflags may be empty in case of success
[arrow] branch master updated: ARROW-4915: [GLib][C++] Add arrow::NullBuilder support for GLib
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 9d2280f ARROW-4915: [GLib][C++] Add arrow::NullBuilder support for GLib 9d2280f is described below commit 9d2280fb9093580fc8073e972bbae3095b75203c Author: Kenta Murata AuthorDate: Sun Mar 17 17:55:02 2019 +0900 ARROW-4915: [GLib][C++] Add arrow::NullBuilder support for GLib This pull request add two things: 1. `arrow::NullBuilder::AppendNulls()` function 2. `GArrowNullArrayBuilder` class Author: Kenta Murata Author: Kouhei Sutou Closes #3938 from mrkn/glib_null_builder and squashes the following commits: e53b004e Accept NullArray.new(n) 17c7c86c Add overflow check in NullBuilder::AppendNull() b31315a2 Add and fix version tags 47af12da Rewrite with G_DECLARE_DERIVABLE_TYPE 8ab32559 Put NullArrayBuilder tests in test-array-builder.rb 1cdb500d Remove needless TODO comment 6713a13c Check overflow in NullBuilder::AppendNulls() 544fc6eb Add GArrowNullArrayBuilder 2038ce23 Add NullBuilder::AppendNulls() function --- c_glib/arrow-glib/array-builder.cpp| 81 ++ c_glib/arrow-glib/array-builder.h | 23 c_glib/test/helper/buildable.rb| 4 ++ c_glib/test/test-array-builder.rb | 33 +++ cpp/src/arrow/array-test.cc| 5 +- cpp/src/arrow/array/builder_primitive.h| 11 ruby/red-arrow/lib/arrow/array-builder.rb | 4 ++ ruby/red-arrow/lib/arrow/array.rb | 2 +- ruby/red-arrow/lib/arrow/loader.rb | 1 + ruby/red-arrow/lib/arrow/null-array-builder.rb | 26 + 10 files changed, 187 insertions(+), 3 deletions(-) diff --git a/c_glib/arrow-glib/array-builder.cpp b/c_glib/arrow-glib/array-builder.cpp index afdae8c..b9a9e71 100644 --- a/c_glib/arrow-glib/array-builder.cpp +++ b/c_glib/arrow-glib/array-builder.cpp @@ -153,6 +153,9 @@ G_BEGIN_DECLS * * You need to use array builder class to create a new array. * + * #GArrowNullArrayBuilder is the class to create a new + * #GArrowNullArray. + * * #GArrowBooleanArrayBuilder is the class to create a new * #GArrowBooleanArray. * @@ -409,6 +412,81 @@ garrow_array_builder_finish(GArrowArrayBuilder *builder, GError **error) } +G_DEFINE_TYPE(GArrowNullArrayBuilder, + garrow_null_array_builder, + GARROW_TYPE_ARRAY_BUILDER) + +static void +garrow_null_array_builder_init(GArrowNullArrayBuilder *builder) +{ +} + +static void +garrow_null_array_builder_class_init(GArrowNullArrayBuilderClass *klass) +{ +} + +/** + * garrow_null_array_builder_new: + * + * Returns: A newly created #GArrowNullArrayBuilder. + * + * Since: 0.13.0 + */ +GArrowNullArrayBuilder * +garrow_null_array_builder_new(void) +{ + auto builder = garrow_array_builder_new(arrow::null(), + NULL, + "[null-array-builder][new]"); + return GARROW_NULL_ARRAY_BUILDER(builder); +} + +/** + * garrow_null_array_builder_append_null: + * @builder: A #GArrowNullArrayBuilder. + * @error: (nullable): Return location for a #GError or %NULL. + * + * Returns: %TRUE on success, %FALSE if there was an error. + * + * Since: 0.13.0 + */ +gboolean +garrow_null_array_builder_append_null(GArrowNullArrayBuilder *builder, + GError **error) +{ + return garrow_array_builder_append_null +(GARROW_ARRAY_BUILDER(builder), + error, + "[null-array-builder][append-null]"); +} + +/** + * garrow_null_array_builder_append_nulls: + * @builder: A #GArrowNullArrayBuilder. + * @n: The number of null values to be appended. + * @error: (nullable): Return location for a #GError or %NULL. + * + * Append multiple nulls at once. It's more efficient than multiple + * `append_null()` calls. + * + * Returns: %TRUE on success, %FALSE if there was an error. + * + * Since: 0.13.0 + */ +gboolean +garrow_null_array_builder_append_nulls(GArrowNullArrayBuilder *builder, + gint64 n, + GError **error) +{ + return garrow_array_builder_append_nulls +(GARROW_ARRAY_BUILDER(builder), + n, + error, + "[null-array-builder][append-nulls]"); +} + + G_DEFINE_TYPE(GArrowBooleanArrayBuilder, garrow_boolean_array_builder, GARROW_TYPE_ARRAY_BUILDER) @@ -3890,6 +3968,9 @@ garrow_array_builder_new_raw(arrow::ArrayBuilder *arrow_builder, { if (type == G_TYPE_INVALID) { switch (arrow_builder->type()->id()) { +case arrow::Type::type::NA: + type = GARROW_TYPE_NULL_ARRAY_BUILDER; + break; case arrow::Type::type::BOOL: type = GARROW_TYPE_BOOLEAN_ARRAY_BUILDER;