[arrow] branch master updated: ARROW-5219: [C++] Build protobuf_ep in parallel when using Ninja build
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 67efb73 ARROW-5219: [C++] Build protobuf_ep in parallel when using Ninja build 67efb73 is described below commit 67efb73eff689180a1d100606f53ce3ad85db4ac Author: Wes McKinney AuthorDate: Fri Apr 26 10:54:57 2019 +0900 ARROW-5219: [C++] Build protobuf_ep in parallel when using Ninja build I noticed this when looking at ARROW-5192 Author: Wes McKinney Closes #4208 from wesm/ARROW-5219 and squashes the following commits: 8c5560f1 Build protobuf_ep in parallel when using Ninja build --- cpp/cmake_modules/ThirdpartyToolchain.cmake | 3 +++ 1 file changed, 3 insertions(+) diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 1c42d87..80c9746 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -1047,6 +1047,9 @@ macro(build_protobuf) CONFIGURE_COMMAND "./configure" ${PROTOBUF_CONFIGURE_ARGS} + BUILD_COMMAND + ${MAKE} + ${MAKE_BUILD_ARGS} BUILD_IN_SOURCE 1 URL
[arrow] branch master updated: Fix Travis-CI doc build failure [skip appveyor] (#4205)
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 621d649 Fix Travis-CI doc build failure [skip appveyor] (#4205) 621d649 is described below commit 621d649059589f6a819fdc3dfe6798f0a5ae2cf9 Author: Antoine Pitrou AuthorDate: Fri Apr 26 03:34:50 2019 +0200 Fix Travis-CI doc build failure [skip appveyor] (#4205) --- ci/conda_env_sphinx.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/ci/conda_env_sphinx.yml b/ci/conda_env_sphinx.yml index dbe9003..af6b407 100644 --- a/ci/conda_env_sphinx.yml +++ b/ci/conda_env_sphinx.yml @@ -19,6 +19,5 @@ breathe doxygen ipython -# TODO(kszucs): remove this pin after breathe supports sphinx 2 -sphinx<2 +sphinx sphinx_rtd_theme
[arrow] branch master updated: ARROW-5049: [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark
This is an automated email from the ASF dual-hosted git repository. emkornfield pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 3f58a14 ARROW-5049: [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark 3f58a14 is described below commit 3f58a14714ccae93ae055f9ba7e6d59b8e3746a1 Author: tiger AuthorDate: Thu Apr 25 13:06:14 2019 -0700 ARROW-5049: [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark Ensure hadoop-common-{version} jar is in the classpath Author: tiger Closes #4081 from chenfj068/master and squashes the following commits: 428827bf ARROW-5049: org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark --- python/pyarrow/hdfs.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/python/pyarrow/hdfs.py b/python/pyarrow/hdfs.py index 9e12675..3ddd3cd 100644 --- a/python/pyarrow/hdfs.py +++ b/python/pyarrow/hdfs.py @@ -123,7 +123,9 @@ class HadoopFileSystem(lib.HadoopFileSystem, FileSystem): def _maybe_set_hadoop_classpath(): -if 'hadoop' in os.environ.get('CLASSPATH', ''): +import re + +if re.search(r'hadoop-common[^/]+.jar', os.environ.get('CLASSPATH', '')): return if 'HADOOP_HOME' in os.environ:
[arrow] branch master updated: ARROW-4827: [C++] Implement benchmark comparison
This is an automated email from the ASF dual-hosted git repository. apitrou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new c3511db ARROW-4827: [C++] Implement benchmark comparison c3511db is described below commit c3511db97e981fd24367916e45fd1d1edd83bb73 Author: François Saint-Jacques AuthorDate: Thu Apr 25 17:54:09 2019 +0200 ARROW-4827: [C++] Implement benchmark comparison This script/library allows comparing revisions/builds. Author: François Saint-Jacques Closes #4141 from fsaintjacques/ARROW-4827-benchmark-comparison and squashes the following commits: a047ae4ed Satisfy flake8 e95baf317 Add comments and move stuff ee39a1feb Move cpp_runner_from_rev_or_path in CppRunner 2a953f180 Missing files d8e3c1c85 Review 514e8e428 Introduce RegressionSetArgs 280c93be4 Update gitignore dc031bde7 Support conda toolchain 28254676c Add --cmake-extras to benchmark-diff command e6762899c Typo 048ba0ede Add verbose_third_party 71b10e98a Disable python in benchmarks c3719214c Fix flake8 warnings 8845e3e78 Remove empty __init__.py 1949f749c Supports HEAD revisions 96f999748 Add gitignore entry d9692bc8f Fix splitlines 90578af61 Add --cmake-extras to build command 7696202ba Add doc for bin attribute. a281ae8e6 Various language fixes 1b028390c Rename --cxx_flags to --cxx-flags bc111b2d3 Removes copied stuff d6733b6f4 Formatting 21b2e14fc Add doc and fix bugs 2a81744cf Ooops. c85661cf3 Add documentation 703cf987a commit 2c0d512f8 Checkpoint a38f49cd9 checkpoint a5ad76d11 Fix syntax 712d2ed3c initial commit --- .gitignore | 2 + cpp/src/arrow/compute/benchmark-util.h | 13 + .../arrow/compute/kernels/aggregate-benchmark.cc | 4 +- dev/archery/archery/benchmark/compare.py | 122 + .../archery/archery/benchmark/core.py | 72 +++--- dev/archery/archery/benchmark/google.py| 162 dev/archery/archery/benchmark/runner.py| 114 + dev/archery/archery/cli.py | 274 + dev/archery/archery/lang/cpp.py| 130 ++ dev/archery/archery/utils/cmake.py | 213 .gitignore => dev/archery/archery/utils/codec.py | 69 ++ dev/archery/archery/utils/command.py | 71 ++ dev/archery/archery/utils/git.py | 73 ++ .gitignore => dev/archery/archery/utils/logger.py | 45 +--- dev/archery/archery/utils/source.py| 141 +++ .gitignore => dev/archery/setup.py | 58 ++--- .gitignore => dev/archery/tests/test_benchmarks.py | 55 ++--- docs/source/developers/benchmarks.rst | 127 ++ docs/source/developers/index.rst | 1 + python/.gitignore | 2 - 20 files changed, 1543 insertions(+), 205 deletions(-) diff --git a/.gitignore b/.gitignore index 6bb237a..4a03020 100644 --- a/.gitignore +++ b/.gitignore @@ -50,6 +50,8 @@ docs/example1.dat docs/example3.dat python/.eggs/ python/doc/ +# Egg metadata +*.egg-info .vscode .idea/ diff --git a/cpp/src/arrow/compute/benchmark-util.h b/cpp/src/arrow/compute/benchmark-util.h index 1678f8d..865da66 100644 --- a/cpp/src/arrow/compute/benchmark-util.h +++ b/cpp/src/arrow/compute/benchmark-util.h @@ -55,5 +55,18 @@ void BenchmarkSetArgs(benchmark::internal::Benchmark* bench) { bench->Args({static_cast(size), nulls}); } +void RegressionSetArgs(benchmark::internal::Benchmark* bench) { + // Benchmark changed its parameter type between releases from + // int to int64_t. As it doesn't have version macros, we need + // to apply C++ template magic. + using ArgsType = + typename BenchmarkArgsType::type; + bench->Unit(benchmark::kMicrosecond); + + // Regressions should only bench L1 data for better stability + for (auto nulls : std::vector({0, 1, 10, 50})) +bench->Args({static_cast(kL1Size), nulls}); +} + } // namespace compute } // namespace arrow diff --git a/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc b/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc index e81f879..bbc923f 100644 --- a/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc +++ b/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc @@ -309,7 +309,7 @@ BENCHMARK_TEMPLATE(BenchSum, SumBitmapNaive)->Apply(BenchmarkSetArgs); BENCHMARK_TEMPLATE(BenchSum, SumBitmapReader)->Apply(BenchmarkSetArgs); BENCHMARK_TEMPLATE(BenchSum, SumBitmapVectorizeUnroll)->Apply(BenchmarkSetArgs); -static void BenchSumKernel(benchmark::State& state) { +static void RegressionSumKernel(benchmark::State& state) {
[arrow] branch master updated: ARROW-4702: [C++] Update dependency versions
This is an automated email from the ASF dual-hosted git repository. wesm pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new f913d8f ARROW-4702: [C++] Update dependency versions f913d8f is described below commit f913d8f0adff71c288a10f6c1b0ad2d1ab3e9e32 Author: Antoine Pitrou AuthorDate: Thu Apr 25 08:28:58 2019 -0500 ARROW-4702: [C++] Update dependency versions Author: Antoine Pitrou Closes #4189 from pitrou/ARROW-4702-update-deps and squashes the following commits: f13660f4e ARROW-4702: Update dependency versions --- cpp/cmake_modules/ThirdpartyToolchain.cmake| 200 +++-- ...17c897976c60b0e6e4f4a365c751027244dada7a.tar.gz | Bin 454719 -> 0 bytes cpp/thirdparty/versions.txt| 23 ++- 3 files changed, 80 insertions(+), 143 deletions(-) diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake index 30f99ff..1c42d87 100644 --- a/cpp/cmake_modules/ThirdpartyToolchain.cmake +++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake @@ -280,6 +280,13 @@ else() "https://github.com/google/googletest/archive/release-${GTEST_VERSION}.tar.gz;) endif() +if(DEFINED ENV{ARROW_JEMALLOC_URL}) + set(JEMALLOC_SOURCE_URL "$ENV{ARROW_JEMALLOC_URL}") +else() + set(JEMALLOC_SOURCE_URL + "https://github.com/jemalloc/jemalloc/archive/${JEMALLOC_VERSION}.tar.gz;) +endif() + if(DEFINED ENV{ARROW_LZ4_URL}) set(LZ4_SOURCE_URL "$ENV{ARROW_LZ4_URL}") else() @@ -320,10 +327,8 @@ endif() if(DEFINED ENV{ARROW_SNAPPY_URL}) set(SNAPPY_SOURCE_URL "$ENV{ARROW_SNAPPY_URL}") else() - set( -SNAPPY_SOURCE_URL - "https://github.com/google/snappy/releases/download/${SNAPPY_VERSION}/snappy-${SNAPPY_VERSION}.tar.gz; -) + set(SNAPPY_SOURCE_URL + "https://github.com/google/snappy/archive/${SNAPPY_VERSION}.tar.gz;) endif() if(DEFINED ENV{ARROW_THRIFT_URL}) @@ -338,8 +343,10 @@ endif() if(DEFINED ENV{ARROW_URIPARSER_URL}) set(URIPARSER_SOURCE_URL "$ENV{ARROW_URIPARSER_URL}") else() - set(URIPARSER_SOURCE_URL - "https://github.com/uriparser/uriparser/archive/${URIPARSER_VERSION}.tar.gz;) + set( +URIPARSER_SOURCE_URL + "https://github.com/uriparser/uriparser/archive/uriparser-${URIPARSER_VERSION}.tar.gz; +) endif() if(DEFINED ENV{ARROW_ZLIB_URL}) @@ -560,10 +567,12 @@ macro(build_uriparser) add_library(uriparser::uriparser STATIC IMPORTED) # Work around https://gitlab.kitware.com/cmake/cmake/issues/15052 file(MAKE_DIRECTORY ${URIPARSER_INCLUDE_DIRS}) - set_target_properties( -uriparser::uriparser -PROPERTIES IMPORTED_LOCATION ${URIPARSER_STATIC_LIB} INTERFACE_INCLUDE_DIRECTORIES - ${URIPARSER_INCLUDE_DIRS}) + set_target_properties(uriparser::uriparser +PROPERTIES IMPORTED_LOCATION ${URIPARSER_STATIC_LIB} + INTERFACE_INCLUDE_DIRECTORIES ${URIPARSER_INCLUDE_DIRS} + # URI_STATIC_BUILD required on Windows + INTERFACE_COMPILE_DEFINITIONS + "URI_STATIC_BUILD;URI_NO_UNICODE") add_dependencies(toolchain uriparser_ep) add_dependencies(uriparser::uriparser uriparser_ep) @@ -586,79 +595,27 @@ include_directories(SYSTEM ${URIPARSER_INCLUDE_DIRS}) macro(build_snappy) message(STATUS "Building snappy from source") set(SNAPPY_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/snappy_ep/src/snappy_ep-install") - if(MSVC) -set(SNAPPY_STATIC_LIB_NAME snappy_static) - else() -set(SNAPPY_STATIC_LIB_NAME snappy) - endif() + set(SNAPPY_STATIC_LIB_NAME snappy) set( SNAPPY_STATIC_LIB "${SNAPPY_PREFIX}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}${SNAPPY_STATIC_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}" ) - if(${UPPERCASE_BUILD_TYPE} EQUAL "RELEASE") -if(APPLE) - set(SNAPPY_CXXFLAGS "CXXFLAGS='-DNDEBUG -O1'") -else() - set(SNAPPY_CXXFLAGS "CXXFLAGS='-DNDEBUG -O2'") -endif() - endif() + set(SNAPPY_CMAKE_ARGS ${EP_COMMON_CMAKE_ARGS} -DSNAPPY_BUILD_TESTS=OFF + "-DCMAKE_INSTALL_PREFIX=${SNAPPY_PREFIX}") - if(WIN32) -set(SNAPPY_CMAKE_ARGS -${EP_COMMON_CMAKE_ARGS} --DCMAKE_AR=${CMAKE_AR} --DCMAKE_RANLIB=${CMAKE_RANLIB} -"-DCMAKE_INSTALL_PREFIX=${SNAPPY_PREFIX}") -set(SNAPPY_UPDATE_COMMAND -${CMAKE_COMMAND} --E -copy -${CMAKE_SOURCE_DIR}/cmake_modules/SnappyCMakeLists.txt -./CMakeLists.txt -&& -${CMAKE_COMMAND} --E -copy -${CMAKE_SOURCE_DIR}/cmake_modules/SnappyConfig.h -./config.h) -externalproject_add(snappy_ep -UPDATE_COMMAND -${SNAPPY_UPDATE_COMMAND} -${EP_LOG_OPTIONS} -BUILD_IN_SOURCE -
[arrow] branch master updated: ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type
This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new ecfb807 ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type ecfb807 is described below commit ecfb807458bfe909ecc8940bd840fc9c6169dd51 Author: Kenta Murata AuthorDate: Thu Apr 25 15:36:07 2019 +0900 ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type This is separated from #3723. This should be merged after #3723. Author: Kenta Murata Author: Kouhei Sutou Closes #4127 from mrkn/glib_ruby_make_union_array_with_field_names and squashes the following commits: e6255567 Fix test data f82ac3d1 Fix test cases d550dc97 Fix comment f1bfa07b Stop copying a type_code vector 606a04c1 Use new constructors of union arrays 5ad55722 Add garrow_dense_union_array_new_data_type c8793d5c Add garrow_sparse_union_array_new_data_type --- c_glib/arrow-glib/composite-array.cpp | 97 ++ c_glib/arrow-glib/composite-array.h| 11 +++ c_glib/test/test-dense-union-array.rb | 90 ++-- c_glib/test/test-sparse-union-array.rb | 87 +-- .../record-batch/test-dense-union-array.rb | 8 +- .../record-batch/test-sparse-union-array.rb| 7 +- 6 files changed, 238 insertions(+), 62 deletions(-) diff --git a/c_glib/arrow-glib/composite-array.cpp b/c_glib/arrow-glib/composite-array.cpp index b202fb4..4fba813 100644 --- a/c_glib/arrow-glib/composite-array.cpp +++ b/c_glib/arrow-glib/composite-array.cpp @@ -366,6 +366,53 @@ garrow_sparse_union_array_new(GArrowInt8Array *type_ids, } } +/** + * garrow_sparse_union_array_new_data_type: + * @data_type: The data type for the sparse array. + * @type_ids: The field type IDs for each value as #GArrowInt8Array. + * @fields: (element-type GArrowArray): The arrays for each field + * as #GList of #GArrowArray. + * @error: (nullable): Return location for a #GError or %NULL. + * + * Returns: (nullable): A newly created #GArrowSparseUnionArray + * or %NULL on error. + * + * Since: 0.14.0 + */ +GArrowSparseUnionArray * +garrow_sparse_union_array_new_data_type(GArrowSparseUnionDataType *data_type, +GArrowInt8Array *type_ids, +GList *fields, +GError **error) +{ + auto arrow_data_type = garrow_data_type_get_raw(GARROW_DATA_TYPE(data_type)); + auto arrow_union_data_type = +std::static_pointer_cast(arrow_data_type); + std::vector arrow_field_names; + for (const auto _field : arrow_union_data_type->children()) { +arrow_field_names.push_back(arrow_field->name()); + } + auto arrow_type_ids = garrow_array_get_raw(GARROW_ARRAY(type_ids)); + std::vector> arrow_fields; + for (auto node = fields; node; node = node->next) { +auto *field = GARROW_ARRAY(node->data); +arrow_fields.push_back(garrow_array_get_raw(field)); + } + std::shared_ptr arrow_union_array; + auto status = arrow::UnionArray::MakeSparse(*arrow_type_ids, + arrow_fields, + arrow_field_names, + arrow_union_data_type->type_codes(), + _union_array); + if (garrow_error_check(error, + status, + "[sparse-union-array][new][data-type]")) { +return GARROW_SPARSE_UNION_ARRAY(garrow_array_new_raw(_union_array)); + } else { +return NULL; + } +} + G_DEFINE_TYPE(GArrowDenseUnionArray, garrow_dense_union_array, @@ -420,6 +467,56 @@ garrow_dense_union_array_new(GArrowInt8Array *type_ids, } } +/** + * garrow_dense_union_array_new_data_type: + * @data_type: The data type for the dense array. + * @type_ids: The field type IDs for each value as #GArrowInt8Array. + * @value_offsets: The value offsets for each value as #GArrowInt32Array. + * Each offset is counted for each type. + * @fields: (element-type GArrowArray): The arrays for each field + * as #GList of #GArrowArray. + * @error: (nullable): Return location for a #GError or %NULL. + * + * Returns: (nullable): A newly created #GArrowSparseUnionArray + * or %NULL on error. + * + * Since: 0.14.0 + */ +GArrowDenseUnionArray * +garrow_dense_union_array_new_data_type(GArrowDenseUnionDataType *data_type, + GArrowInt8Array *type_ids, + GArrowInt32Array *value_offsets, + GList *fields, + GError **error) +{ + auto arrow_data_type =