[arrow] branch master updated: ARROW-5219: [C++] Build protobuf_ep in parallel when using Ninja build

2019-04-25 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 67efb73  ARROW-5219: [C++] Build protobuf_ep in parallel when using 
Ninja build
67efb73 is described below

commit 67efb73eff689180a1d100606f53ce3ad85db4ac
Author: Wes McKinney 
AuthorDate: Fri Apr 26 10:54:57 2019 +0900

ARROW-5219: [C++] Build protobuf_ep in parallel when using Ninja build

I noticed this when looking at ARROW-5192

Author: Wes McKinney 

Closes #4208 from wesm/ARROW-5219 and squashes the following commits:

8c5560f1  Build protobuf_ep in parallel when using Ninja build
---
 cpp/cmake_modules/ThirdpartyToolchain.cmake | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 1c42d87..80c9746 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1047,6 +1047,9 @@ macro(build_protobuf)
   CONFIGURE_COMMAND
   "./configure"
   ${PROTOBUF_CONFIGURE_ARGS}
+  BUILD_COMMAND
+  ${MAKE}
+  ${MAKE_BUILD_ARGS}
   BUILD_IN_SOURCE
   1
   URL



[arrow] branch master updated: Fix Travis-CI doc build failure [skip appveyor] (#4205)

2019-04-25 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 621d649  Fix Travis-CI doc build failure [skip appveyor] (#4205)
621d649 is described below

commit 621d649059589f6a819fdc3dfe6798f0a5ae2cf9
Author: Antoine Pitrou 
AuthorDate: Fri Apr 26 03:34:50 2019 +0200

Fix Travis-CI doc build failure [skip appveyor] (#4205)
---
 ci/conda_env_sphinx.yml | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/ci/conda_env_sphinx.yml b/ci/conda_env_sphinx.yml
index dbe9003..af6b407 100644
--- a/ci/conda_env_sphinx.yml
+++ b/ci/conda_env_sphinx.yml
@@ -19,6 +19,5 @@
 breathe
 doxygen
 ipython
-# TODO(kszucs): remove this pin after breathe supports sphinx 2
-sphinx<2
+sphinx
 sphinx_rtd_theme



[arrow] branch master updated: ARROW-5049: [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark

2019-04-25 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 3f58a14  ARROW-5049: [Python] org/apache/hadoop/fs/FileSystem class 
not found when pyarrow FileSystem used in spark
3f58a14 is described below

commit 3f58a14714ccae93ae055f9ba7e6d59b8e3746a1
Author: tiger 
AuthorDate: Thu Apr 25 13:06:14 2019 -0700

ARROW-5049: [Python] org/apache/hadoop/fs/FileSystem class not found when 
pyarrow FileSystem used in spark

Ensure hadoop-common-{version} jar is  in the classpath

Author: tiger 

Closes #4081 from chenfj068/master and squashes the following commits:

428827bf  ARROW-5049:  org/apache/hadoop/fs/FileSystem class not 
found when pyarrow FileSystem used in spark
---
 python/pyarrow/hdfs.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/python/pyarrow/hdfs.py b/python/pyarrow/hdfs.py
index 9e12675..3ddd3cd 100644
--- a/python/pyarrow/hdfs.py
+++ b/python/pyarrow/hdfs.py
@@ -123,7 +123,9 @@ class HadoopFileSystem(lib.HadoopFileSystem, FileSystem):
 
 
 def _maybe_set_hadoop_classpath():
-if 'hadoop' in os.environ.get('CLASSPATH', ''):
+import re
+
+if re.search(r'hadoop-common[^/]+.jar', os.environ.get('CLASSPATH', '')):
 return
 
 if 'HADOOP_HOME' in os.environ:



[arrow] branch master updated: ARROW-4827: [C++] Implement benchmark comparison

2019-04-25 Thread apitrou
This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new c3511db  ARROW-4827: [C++] Implement benchmark comparison
c3511db is described below

commit c3511db97e981fd24367916e45fd1d1edd83bb73
Author: François Saint-Jacques 
AuthorDate: Thu Apr 25 17:54:09 2019 +0200

ARROW-4827: [C++] Implement benchmark comparison

This script/library allows comparing revisions/builds.

Author: François Saint-Jacques 

Closes #4141 from fsaintjacques/ARROW-4827-benchmark-comparison and 
squashes the following commits:

a047ae4ed  Satisfy flake8
e95baf317  Add comments and move stuff
ee39a1feb  Move cpp_runner_from_rev_or_path in 
CppRunner
2a953f180  Missing files
d8e3c1c85  Review
514e8e428  Introduce RegressionSetArgs
280c93be4  Update gitignore
dc031bde7  Support conda toolchain
28254676c  Add --cmake-extras to benchmark-diff 
command
e6762899c  Typo
048ba0ede  Add verbose_third_party
71b10e98a  Disable python in benchmarks
c3719214c  Fix flake8 warnings
8845e3e78  Remove empty __init__.py
1949f749c  Supports HEAD revisions
96f999748  Add gitignore entry
d9692bc8f  Fix splitlines
90578af61  Add --cmake-extras to build command
7696202ba  Add doc for bin attribute.
a281ae8e6  Various language fixes
1b028390c  Rename --cxx_flags to --cxx-flags
bc111b2d3  Removes copied stuff
d6733b6f4  Formatting
21b2e14fc  Add doc and fix bugs
2a81744cf  Ooops.
c85661cf3  Add documentation
703cf987a  commit
2c0d512f8  Checkpoint
a38f49cd9  checkpoint
a5ad76d11  Fix syntax
712d2ed3c  initial commit
---
 .gitignore |   2 +
 cpp/src/arrow/compute/benchmark-util.h |  13 +
 .../arrow/compute/kernels/aggregate-benchmark.cc   |   4 +-
 dev/archery/archery/benchmark/compare.py   | 122 +
 .../archery/archery/benchmark/core.py  |  72 +++---
 dev/archery/archery/benchmark/google.py| 162 
 dev/archery/archery/benchmark/runner.py| 114 +
 dev/archery/archery/cli.py | 274 +
 dev/archery/archery/lang/cpp.py| 130 ++
 dev/archery/archery/utils/cmake.py | 213 
 .gitignore => dev/archery/archery/utils/codec.py   |  69 ++
 dev/archery/archery/utils/command.py   |  71 ++
 dev/archery/archery/utils/git.py   |  73 ++
 .gitignore => dev/archery/archery/utils/logger.py  |  45 +---
 dev/archery/archery/utils/source.py| 141 +++
 .gitignore => dev/archery/setup.py |  58 ++---
 .gitignore => dev/archery/tests/test_benchmarks.py |  55 ++---
 docs/source/developers/benchmarks.rst  | 127 ++
 docs/source/developers/index.rst   |   1 +
 python/.gitignore  |   2 -
 20 files changed, 1543 insertions(+), 205 deletions(-)

diff --git a/.gitignore b/.gitignore
index 6bb237a..4a03020 100644
--- a/.gitignore
+++ b/.gitignore
@@ -50,6 +50,8 @@ docs/example1.dat
 docs/example3.dat
 python/.eggs/
 python/doc/
+# Egg metadata
+*.egg-info
 
 .vscode
 .idea/
diff --git a/cpp/src/arrow/compute/benchmark-util.h 
b/cpp/src/arrow/compute/benchmark-util.h
index 1678f8d..865da66 100644
--- a/cpp/src/arrow/compute/benchmark-util.h
+++ b/cpp/src/arrow/compute/benchmark-util.h
@@ -55,5 +55,18 @@ void BenchmarkSetArgs(benchmark::internal::Benchmark* bench) 
{
   bench->Args({static_cast(size), nulls});
 }
 
+void RegressionSetArgs(benchmark::internal::Benchmark* bench) {
+  // Benchmark changed its parameter type between releases from
+  // int to int64_t. As it doesn't have version macros, we need
+  // to apply C++ template magic.
+  using ArgsType =
+  typename 
BenchmarkArgsType::type;
+  bench->Unit(benchmark::kMicrosecond);
+
+  // Regressions should only bench L1 data for better stability
+  for (auto nulls : std::vector({0, 1, 10, 50}))
+bench->Args({static_cast(kL1Size), nulls});
+}
+
 }  // namespace compute
 }  // namespace arrow
diff --git a/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc 
b/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc
index e81f879..bbc923f 100644
--- a/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc
+++ b/cpp/src/arrow/compute/kernels/aggregate-benchmark.cc
@@ -309,7 +309,7 @@ BENCHMARK_TEMPLATE(BenchSum, 
SumBitmapNaive)->Apply(BenchmarkSetArgs);
 BENCHMARK_TEMPLATE(BenchSum, 
SumBitmapReader)->Apply(BenchmarkSetArgs);
 BENCHMARK_TEMPLATE(BenchSum, 
SumBitmapVectorizeUnroll)->Apply(BenchmarkSetArgs);
 
-static void BenchSumKernel(benchmark::State& state) {
+static void RegressionSumKernel(benchmark::State& state) {
   

[arrow] branch master updated: ARROW-4702: [C++] Update dependency versions

2019-04-25 Thread wesm
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new f913d8f  ARROW-4702: [C++] Update dependency versions
f913d8f is described below

commit f913d8f0adff71c288a10f6c1b0ad2d1ab3e9e32
Author: Antoine Pitrou 
AuthorDate: Thu Apr 25 08:28:58 2019 -0500

ARROW-4702: [C++] Update dependency versions

Author: Antoine Pitrou 

Closes #4189 from pitrou/ARROW-4702-update-deps and squashes the following 
commits:

f13660f4e  ARROW-4702:  Update dependency versions
---
 cpp/cmake_modules/ThirdpartyToolchain.cmake| 200 +++--
 ...17c897976c60b0e6e4f4a365c751027244dada7a.tar.gz | Bin 454719 -> 0 bytes
 cpp/thirdparty/versions.txt|  23 ++-
 3 files changed, 80 insertions(+), 143 deletions(-)

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 30f99ff..1c42d87 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -280,6 +280,13 @@ else()
   
"https://github.com/google/googletest/archive/release-${GTEST_VERSION}.tar.gz;)
 endif()
 
+if(DEFINED ENV{ARROW_JEMALLOC_URL})
+  set(JEMALLOC_SOURCE_URL "$ENV{ARROW_JEMALLOC_URL}")
+else()
+  set(JEMALLOC_SOURCE_URL
+  
"https://github.com/jemalloc/jemalloc/archive/${JEMALLOC_VERSION}.tar.gz;)
+endif()
+
 if(DEFINED ENV{ARROW_LZ4_URL})
   set(LZ4_SOURCE_URL "$ENV{ARROW_LZ4_URL}")
 else()
@@ -320,10 +327,8 @@ endif()
 if(DEFINED ENV{ARROW_SNAPPY_URL})
   set(SNAPPY_SOURCE_URL "$ENV{ARROW_SNAPPY_URL}")
 else()
-  set(
-SNAPPY_SOURCE_URL
-
"https://github.com/google/snappy/releases/download/${SNAPPY_VERSION}/snappy-${SNAPPY_VERSION}.tar.gz;
-)
+  set(SNAPPY_SOURCE_URL
+  "https://github.com/google/snappy/archive/${SNAPPY_VERSION}.tar.gz;)
 endif()
 
 if(DEFINED ENV{ARROW_THRIFT_URL})
@@ -338,8 +343,10 @@ endif()
 if(DEFINED ENV{ARROW_URIPARSER_URL})
   set(URIPARSER_SOURCE_URL "$ENV{ARROW_URIPARSER_URL}")
 else()
-  set(URIPARSER_SOURCE_URL
-  
"https://github.com/uriparser/uriparser/archive/${URIPARSER_VERSION}.tar.gz;)
+  set(
+URIPARSER_SOURCE_URL
+
"https://github.com/uriparser/uriparser/archive/uriparser-${URIPARSER_VERSION}.tar.gz;
+)
 endif()
 
 if(DEFINED ENV{ARROW_ZLIB_URL})
@@ -560,10 +567,12 @@ macro(build_uriparser)
   add_library(uriparser::uriparser STATIC IMPORTED)
   # Work around https://gitlab.kitware.com/cmake/cmake/issues/15052
   file(MAKE_DIRECTORY ${URIPARSER_INCLUDE_DIRS})
-  set_target_properties(
-uriparser::uriparser
-PROPERTIES IMPORTED_LOCATION ${URIPARSER_STATIC_LIB} 
INTERFACE_INCLUDE_DIRECTORIES
-   ${URIPARSER_INCLUDE_DIRS})
+  set_target_properties(uriparser::uriparser
+PROPERTIES IMPORTED_LOCATION ${URIPARSER_STATIC_LIB}
+   INTERFACE_INCLUDE_DIRECTORIES 
${URIPARSER_INCLUDE_DIRS}
+   # URI_STATIC_BUILD required on Windows
+   INTERFACE_COMPILE_DEFINITIONS
+   "URI_STATIC_BUILD;URI_NO_UNICODE")
 
   add_dependencies(toolchain uriparser_ep)
   add_dependencies(uriparser::uriparser uriparser_ep)
@@ -586,79 +595,27 @@ include_directories(SYSTEM ${URIPARSER_INCLUDE_DIRS})
 macro(build_snappy)
   message(STATUS "Building snappy from source")
   set(SNAPPY_PREFIX 
"${CMAKE_CURRENT_BINARY_DIR}/snappy_ep/src/snappy_ep-install")
-  if(MSVC)
-set(SNAPPY_STATIC_LIB_NAME snappy_static)
-  else()
-set(SNAPPY_STATIC_LIB_NAME snappy)
-  endif()
+  set(SNAPPY_STATIC_LIB_NAME snappy)
   set(
 SNAPPY_STATIC_LIB
 
"${SNAPPY_PREFIX}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}${SNAPPY_STATIC_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}"
 )
 
-  if(${UPPERCASE_BUILD_TYPE} EQUAL "RELEASE")
-if(APPLE)
-  set(SNAPPY_CXXFLAGS "CXXFLAGS='-DNDEBUG -O1'")
-else()
-  set(SNAPPY_CXXFLAGS "CXXFLAGS='-DNDEBUG -O2'")
-endif()
-  endif()
+  set(SNAPPY_CMAKE_ARGS ${EP_COMMON_CMAKE_ARGS} -DSNAPPY_BUILD_TESTS=OFF
+  "-DCMAKE_INSTALL_PREFIX=${SNAPPY_PREFIX}")
 
-  if(WIN32)
-set(SNAPPY_CMAKE_ARGS
-${EP_COMMON_CMAKE_ARGS}
--DCMAKE_AR=${CMAKE_AR}
--DCMAKE_RANLIB=${CMAKE_RANLIB}
-"-DCMAKE_INSTALL_PREFIX=${SNAPPY_PREFIX}")
-set(SNAPPY_UPDATE_COMMAND
-${CMAKE_COMMAND}
--E
-copy
-${CMAKE_SOURCE_DIR}/cmake_modules/SnappyCMakeLists.txt
-./CMakeLists.txt
-&&
-${CMAKE_COMMAND}
--E
-copy
-${CMAKE_SOURCE_DIR}/cmake_modules/SnappyConfig.h
-./config.h)
-externalproject_add(snappy_ep
-UPDATE_COMMAND
-${SNAPPY_UPDATE_COMMAND}
-${EP_LOG_OPTIONS}
-BUILD_IN_SOURCE
-   

[arrow] branch master updated: ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type

2019-04-25 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new ecfb807  ARROW-5155: [GLib][Ruby] Add support for building union 
arrays from data type
ecfb807 is described below

commit ecfb807458bfe909ecc8940bd840fc9c6169dd51
Author: Kenta Murata 
AuthorDate: Thu Apr 25 15:36:07 2019 +0900

ARROW-5155: [GLib][Ruby] Add support for building union arrays from data 
type

This is separated from #3723.
This should be merged after #3723.

Author: Kenta Murata 
Author: Kouhei Sutou 

Closes #4127 from mrkn/glib_ruby_make_union_array_with_field_names and 
squashes the following commits:

e6255567  Fix test data
f82ac3d1   Fix test cases
d550dc97   Fix comment
f1bfa07b   Stop copying a type_code vector
606a04c1   Use new constructors of union arrays
5ad55722   Add garrow_dense_union_array_new_data_type
c8793d5c   Add garrow_sparse_union_array_new_data_type
---
 c_glib/arrow-glib/composite-array.cpp  | 97 ++
 c_glib/arrow-glib/composite-array.h| 11 +++
 c_glib/test/test-dense-union-array.rb  | 90 ++--
 c_glib/test/test-sparse-union-array.rb | 87 +--
 .../record-batch/test-dense-union-array.rb |  8 +-
 .../record-batch/test-sparse-union-array.rb|  7 +-
 6 files changed, 238 insertions(+), 62 deletions(-)

diff --git a/c_glib/arrow-glib/composite-array.cpp 
b/c_glib/arrow-glib/composite-array.cpp
index b202fb4..4fba813 100644
--- a/c_glib/arrow-glib/composite-array.cpp
+++ b/c_glib/arrow-glib/composite-array.cpp
@@ -366,6 +366,53 @@ garrow_sparse_union_array_new(GArrowInt8Array *type_ids,
   }
 }
 
+/**
+ * garrow_sparse_union_array_new_data_type:
+ * @data_type: The data type for the sparse array.
+ * @type_ids: The field type IDs for each value as #GArrowInt8Array.
+ * @fields: (element-type GArrowArray): The arrays for each field
+ *   as #GList of #GArrowArray.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Returns: (nullable): A newly created #GArrowSparseUnionArray
+ *   or %NULL on error.
+ *
+ * Since: 0.14.0
+ */
+GArrowSparseUnionArray *
+garrow_sparse_union_array_new_data_type(GArrowSparseUnionDataType *data_type,
+GArrowInt8Array *type_ids,
+GList *fields,
+GError **error)
+{
+  auto arrow_data_type = garrow_data_type_get_raw(GARROW_DATA_TYPE(data_type));
+  auto arrow_union_data_type =
+std::static_pointer_cast(arrow_data_type);
+  std::vector arrow_field_names;
+  for (const auto _field : arrow_union_data_type->children()) {
+arrow_field_names.push_back(arrow_field->name());
+  }
+  auto arrow_type_ids = garrow_array_get_raw(GARROW_ARRAY(type_ids));
+  std::vector> arrow_fields;
+  for (auto node = fields; node; node = node->next) {
+auto *field = GARROW_ARRAY(node->data);
+arrow_fields.push_back(garrow_array_get_raw(field));
+  }
+  std::shared_ptr arrow_union_array;
+  auto status = arrow::UnionArray::MakeSparse(*arrow_type_ids,
+  arrow_fields,
+  arrow_field_names,
+  
arrow_union_data_type->type_codes(),
+  _union_array);
+  if (garrow_error_check(error,
+ status,
+ "[sparse-union-array][new][data-type]")) {
+return GARROW_SPARSE_UNION_ARRAY(garrow_array_new_raw(_union_array));
+  } else {
+return NULL;
+  }
+}
+
 
 G_DEFINE_TYPE(GArrowDenseUnionArray,
   garrow_dense_union_array,
@@ -420,6 +467,56 @@ garrow_dense_union_array_new(GArrowInt8Array *type_ids,
   }
 }
 
+/**
+ * garrow_dense_union_array_new_data_type:
+ * @data_type: The data type for the dense array.
+ * @type_ids: The field type IDs for each value as #GArrowInt8Array.
+ * @value_offsets: The value offsets for each value as #GArrowInt32Array.
+ *   Each offset is counted for each type.
+ * @fields: (element-type GArrowArray): The arrays for each field
+ *   as #GList of #GArrowArray.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Returns: (nullable): A newly created #GArrowSparseUnionArray
+ *   or %NULL on error.
+ *
+ * Since: 0.14.0
+ */
+GArrowDenseUnionArray *
+garrow_dense_union_array_new_data_type(GArrowDenseUnionDataType *data_type,
+   GArrowInt8Array *type_ids,
+   GArrowInt32Array *value_offsets,
+   GList *fields,
+   GError **error)
+{
+  auto arrow_data_type =