[arrow] branch master updated (7ec917b -> 8d1d57c)

2020-12-06 Thread praveenbingo
This is an automated email from the ASF dual-hosted git repository.

praveenbingo pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 7ec917b  ARROW-10802: [C++] remove special casing for 
Dictionary[NullType] in parquet column writer
 add 8d1d57c  ARROW-10779: [Java] Fix writeNull method in UnionListWriter

No new revisions were added by this update.

Summary of changes:
 .../main/codegen/templates/UnionListWriter.java|  11 +-
 .../arrow/vector/complex/LargeListVector.java  |  18 +++
 .../apache/arrow/vector/complex/ListVector.java|  18 +++
 .../vector/complex/impl/PromotableWriter.java  |   4 +
 .../vector/complex/writer/TestComplexWriter.java   | 163 +
 5 files changed, 213 insertions(+), 1 deletion(-)



[arrow] branch master updated: ARROW-10802: [C++] remove special casing for Dictionary[NullType] in parquet column writer

2020-12-06 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ec917b  ARROW-10802: [C++] remove special casing for 
Dictionary[NullType] in parquet column writer
7ec917b is described below

commit 7ec917b80461f0114dcb2536b80e16f42cb45430
Author: Andrew Wieteska 
AuthorDate: Mon Dec 7 15:54:10 2020 +0900

ARROW-10802: [C++] remove special casing for Dictionary[NullType] in 
parquet column writer

ARROW-1648 was fixed a while back (in 0.8) so we can now rely on 
`arrow::compute::Cast`

Closes #8828 from arw2019/column_writer-DictNullType-special-casing

Authored-by: Andrew Wieteska 
Signed-off-by: Sutou Kouhei 
---
 cpp/src/parquet/column_writer.cc | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/cpp/src/parquet/column_writer.cc b/cpp/src/parquet/column_writer.cc
index 4d3197b..e35eeac 100644
--- a/cpp/src/parquet/column_writer.cc
+++ b/cpp/src/parquet/column_writer.cc
@@ -1011,13 +1011,6 @@ Status ConvertDictionaryToDense(const ::arrow::Array& 
array, MemoryPool* pool,
   const ::arrow::DictionaryType& dict_type =
   static_cast(*array.type());
 
-  // TODO(ARROW-1648): Remove this special handling once we require an Arrow
-  // version that has this fixed.
-  if (dict_type.value_type()->id() == ::arrow::Type::NA) {
-*out = std::make_shared<::arrow::NullArray>(array.length());
-return Status::OK();
-  }
-
   ::arrow::compute::ExecContext ctx(pool);
   ARROW_ASSIGN_OR_RAISE(Datum cast_output,
 ::arrow::compute::Cast(array.data(), 
dict_type.value_type(),



[arrow] branch master updated (7708519 -> ef3cff6)

2020-12-06 Thread jorgecarleitao
This is an automated email from the ASF dual-hosted git repository.

jorgecarleitao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 7708519  ARROW-10591: [Rust] Add support for StructArray to 
MutableArrayData
 add ef3cff6  ARROW-10828: [Rust][DataFusion] Address / fix clippy lints

No new revisions were added by this update.

Summary of changes:
 rust/datafusion/src/execution/context.rs   | 4 ++--
 rust/datafusion/src/lib.rs | 4 
 rust/datafusion/src/logical_plan/builder.rs| 5 ++---
 rust/datafusion/src/optimizer/projection_push_down.rs  | 8 
 rust/datafusion/src/physical_plan/aggregates.rs| 2 +-
 rust/datafusion/src/physical_plan/array_expressions.rs | 2 +-
 rust/datafusion/src/physical_plan/common.rs| 2 +-
 rust/datafusion/src/physical_plan/expressions.rs   | 2 +-
 rust/datafusion/src/physical_plan/functions.rs | 2 +-
 rust/datafusion/src/physical_plan/planner.rs   | 2 +-
 rust/datafusion/src/physical_plan/type_coercion.rs | 2 +-
 11 files changed, 15 insertions(+), 20 deletions(-)



[arrow] branch master updated (e1c1e05 -> 7708519)

2020-12-06 Thread jorgecarleitao
This is an automated email from the ASF dual-hosted git repository.

jorgecarleitao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from e1c1e05  ARROW-10821 [Rust][Datafusion] support negative expression
 add 7708519  ARROW-10591: [Rust] Add support for StructArray to 
MutableArrayData

No new revisions were added by this update.

Summary of changes:
 rust/arrow/src/array/transform/mod.rs   | 135 +++-
 rust/arrow/src/array/transform/structure.rs |  64 +
 2 files changed, 196 insertions(+), 3 deletions(-)
 create mode 100644 rust/arrow/src/array/transform/structure.rs



[arrow] branch master updated: ARROW-10821 [Rust][Datafusion] support negative expression

2020-12-06 Thread jorgecarleitao
This is an automated email from the ASF dual-hosted git repository.

jorgecarleitao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new e1c1e05  ARROW-10821 [Rust][Datafusion] support negative expression
e1c1e05 is described below

commit e1c1e054ff8a00353a975e5139277573921eac0a
Author: Qingping Hou 
AuthorDate: Mon Dec 7 07:00:46 2020 +0100

ARROW-10821 [Rust][Datafusion] support negative expression

To support queries like `SELECT c3 FROM aggregate_test_100 WHERE -c4 > 0`:

  * add negate compute kernel in arrow
  * add negative expression in datafusion
  * support negative and positive operators in datafusion's sql planner

Closes #8846 from houqp/qp_negative

Authored-by: Qingping Hou 
Signed-off-by: Jorge C. Leitao 
---
 rust/arrow/src/array/cast.rs |   1 +
 rust/arrow/src/array/mod.rs  |   4 +-
 rust/arrow/src/compute/kernels/arithmetic.rs |  88 +--
 rust/arrow/src/datatypes.rs  |  75 
 rust/datafusion/src/logical_plan/expr.rs |   9 ++
 rust/datafusion/src/optimizer/utils.rs   |   6 +-
 rust/datafusion/src/physical_plan/expressions.rs | 105 +--
 rust/datafusion/src/physical_plan/planner.rs |   4 +
 rust/datafusion/src/scalar.rs|  19 
 rust/datafusion/src/sql/planner.rs   |  49 ---
 rust/datafusion/tests/sql.rs |  11 +++
 11 files changed, 346 insertions(+), 25 deletions(-)

diff --git a/rust/arrow/src/array/cast.rs b/rust/arrow/src/array/cast.rs
index 56e5d3a..a0ef7e2 100644
--- a/rust/arrow/src/array/cast.rs
+++ b/rust/arrow/src/array/cast.rs
@@ -59,5 +59,6 @@ macro_rules! array_downcast_fn {
 }
 
 array_downcast_fn!(as_string_array, StringArray);
+array_downcast_fn!(as_largestring_array, LargeStringArray);
 array_downcast_fn!(as_boolean_array, BooleanArray);
 array_downcast_fn!(as_null_array, NullArray);
diff --git a/rust/arrow/src/array/mod.rs b/rust/arrow/src/array/mod.rs
index fb0b302..cb1c13e 100644
--- a/rust/arrow/src/array/mod.rs
+++ b/rust/arrow/src/array/mod.rs
@@ -270,8 +270,8 @@ pub use self::ord::{build_compare, DynComparator};
 // - Array downcast helper functions -
 
 pub use self::cast::{
-as_boolean_array, as_dictionary_array, as_null_array, as_primitive_array,
-as_string_array,
+as_boolean_array, as_dictionary_array, as_largestring_array, as_null_array,
+as_primitive_array, as_string_array,
 };
 
 // -- C Data Interface ---
diff --git a/rust/arrow/src/compute/kernels/arithmetic.rs 
b/rust/arrow/src/compute/kernels/arithmetic.rs
index fe1bda5..e0bd37d 100644
--- a/rust/arrow/src/compute/kernels/arithmetic.rs
+++ b/rust/arrow/src/compute/kernels/arithmetic.rs
@@ -24,7 +24,7 @@
 
 #[cfg(feature = "simd")]
 use std::mem;
-use std::ops::{Add, Div, Mul, Sub};
+use std::ops::{Add, Div, Mul, Neg, Sub};
 #[cfg(feature = "simd")]
 use std::slice::from_raw_parts_mut;
 use std::sync::Arc;
@@ -44,6 +44,72 @@ use crate::datatypes::ToByteSlice;
 use crate::error::{ArrowError, Result};
 use crate::{array::*, util::bit_util};
 
+/// Helper function to perform math lambda function on values from single 
array of signed numeric
+/// type. If value is null then the output value is also null, so `-null` is 
`null`.
+pub fn signed_unary_math_op(
+array: ,
+op: F,
+) -> Result>
+where
+T: datatypes::ArrowSignedNumericType,
+T::Native: Neg,
+F: Fn(T::Native) -> T::Native,
+{
+let values = (0..array.len())
+.map(|i| op(array.value(i)))
+.collect::>();
+
+let data = ArrayData::new(
+T::DATA_TYPE,
+array.len(),
+None,
+array.data_ref().null_buffer().cloned(),
+0,
+vec![Buffer::from(values.to_byte_slice())],
+vec![],
+);
+Ok(PrimitiveArrayfrom(Arc::new(data)))
+}
+
+/// SIMD vectorized version of `signed_unary_math_op` above.
+#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), feature = "simd"))]
+fn simd_signed_unary_math_op(
+array: ,
+op: F,
+) -> Result>
+where
+T: datatypes::ArrowSignedNumericType,
+F: Fn(T::SignedSimd) -> T::SignedSimd,
+{
+let lanes = T::lanes();
+let buffer_size = array.len() * mem::size_of::();
+let mut result = MutableBuffer::new(buffer_size).with_bitset(buffer_size, 
false);
+
+for i in (0..array.len()).step_by(lanes) {
+let simd_result =
+T::signed_unary_op(T::load_signed(array.value_slice(i, lanes)), 
);
+
+let result_slice:  [T::Native] = unsafe {
+from_raw_parts_mut(
+(result.data_mut().as_mut_ptr() as *mut T::Native).add(i),
+lanes,
+)
+};
+

[arrow] branch master updated (3453943 -> 57829f5)

2020-12-06 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 3453943  ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in 
tests
 add 57829f5  ARROW-10748: [Java][JDBC] Support consuming timestamp data 
when time zone is not available

No new revisions were added by this update.

Summary of changes:
 .../arrow/adapter/jdbc/JdbcToArrowUtils.java   |  8 +++-
 .../adapter/jdbc/consumer/TimestampConsumer.java   | 49 ++
 ...stampConsumer.java => TimestampTZConsumer.java} | 41 ++
 .../jdbc/h2/JdbcToArrowVectorIteratorTest.java | 46 +++-
 4 files changed, 78 insertions(+), 66 deletions(-)
 copy 
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/consumer/{TimestampConsumer.java
 => TimestampTZConsumer.java} (65%)



[arrow] branch master updated: ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in tests

2020-12-06 Thread kou
This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 3453943  ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in 
tests
3453943 is described below

commit 34539432a3f49672cd352e2fa8f626489ea3e954
Author: Andrew Wieteska 
AuthorDate: Mon Dec 7 14:00:53 2020 +0900

ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in tests

As per a TODO left in ARROW-3769 / #3721 we can now use the `GTEST_SKIP` 
macro in `parquet/encoding-test.cpp`. `GTEST_SKIP` was added in gtest 1.10.0 so 
this involves bumping our minimal gtest version from 1.8.1

Closes #8782 from arw2019/ARROW-10746-GTEST_SKIP

Lead-authored-by: Andrew Wieteska 
Co-authored-by: Sutou Kouhei 
Signed-off-by: Sutou Kouhei 
---
 ci/conda_env_cpp.yml|  4 +-
 cpp/cmake_modules/SetupCxxFlags.cmake   | 21 +++
 cpp/cmake_modules/ThirdpartyToolchain.cmake | 93 ++---
 cpp/src/arrow/io/hdfs_test.cc   |  7 +--
 cpp/src/arrow/util/compression_test.cc  | 33 --
 cpp/src/parquet/column_scanner_test.cc  |  6 +-
 cpp/src/parquet/column_writer_test.cc   |  2 -
 cpp/src/parquet/encoding_test.cc| 75 ---
 cpp/src/parquet/file_serialize_test.cc  |  2 -
 cpp/src/parquet/statistics_test.cc  | 48 +++
 cpp/src/parquet/test_util.h | 20 +++
 cpp/thirdparty/versions.txt |  2 +-
 12 files changed, 171 insertions(+), 142 deletions(-)

diff --git a/ci/conda_env_cpp.yml b/ci/conda_env_cpp.yml
index 870d851..390eb7d 100644
--- a/ci/conda_env_cpp.yml
+++ b/ci/conda_env_cpp.yml
@@ -24,9 +24,9 @@ c-ares
 cmake
 gflags
 glog
-gmock>=1.8.1
+gmock>=1.10.0
 grpc-cpp>=1.27.3
-gtest=1.8.1
+gtest=1.10.0
 libprotobuf
 libutf8proc
 lz4-c
diff --git a/cpp/cmake_modules/SetupCxxFlags.cmake 
b/cpp/cmake_modules/SetupCxxFlags.cmake
index a5cd95b..402d18f 100644
--- a/cpp/cmake_modules/SetupCxxFlags.cmake
+++ b/cpp/cmake_modules/SetupCxxFlags.cmake
@@ -159,6 +159,23 @@ if(WIN32)
   set(CXX_COMMON_FLAGS "/W3 /EHsc")
 endif()
 
+# Disable C5105 (macro expansion producing 'defined' has undefined
+# behavior) warning because there are codes that produce this
+# warning in Windows Kits. e.g.:
+#
+#   #define _CRT_INTERNAL_NONSTDC_NAMES
\
+#( 
 \
+#( defined _CRT_DECLARE_NONSTDC_NAMES && 
_CRT_DECLARE_NONSTDC_NAMES) || \
+#(!defined _CRT_DECLARE_NONSTDC_NAMES && !__STDC__ 
)\
+#)
+#
+# See also:
+# * C5105: 
https://docs.microsoft.com/en-US/cpp/error-messages/compiler-warnings/c5105
+# * Related reports:
+#   * 
https://developercommunity.visualstudio.com/content/problem/387684/c5105-with-stdioh-and-experimentalpreprocessor.html
+#   * 
https://developercommunity.visualstudio.com/content/problem/1249671/stdc17-generates-warning-compiling-windowsh.html
+set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /wd5105")
+
 if(ARROW_USE_STATIC_CRT)
   foreach(c_flag
   CMAKE_CXX_FLAGS
@@ -177,6 +194,10 @@ if(WIN32)
 
 # Support large object code
 set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /bigobj")
+
+# We may use UTF-8 in source code such as
+# cpp/src/arrow/compute/kernels/scalar_string_test.cc
+set(CXX_COMMON_FLAGS "${CXX_COMMON_FLAGS} /utf-8")
   else()
 # MinGW
 check_cxx_compiler_flag(-Wa,-mbig-obj CXX_SUPPORTS_BIG_OBJ)
diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index df03c31..47e6be2 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1603,21 +1603,17 @@ macro(build_gtest)
   set(GTEST_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/googletest_ep-prefix")
   set(GTEST_INCLUDE_DIR "${GTEST_PREFIX}/include")
 
-  set(_GTEST_RUNTIME_DIR ${BUILD_OUTPUT_ROOT_DIRECTORY})
+  set(_GTEST_LIBRARY_DIR "${GTEST_PREFIX}/lib")
 
   if(MSVC)
 set(_GTEST_IMPORTED_TYPE IMPORTED_IMPLIB)
 set(_GTEST_LIBRARY_SUFFIX
 "${CMAKE_GTEST_DEBUG_EXTENSION}${CMAKE_IMPORT_LIBRARY_SUFFIX}")
-# Use the import libraries from the EP
-set(_GTEST_LIBRARY_DIR "${GTEST_PREFIX}/lib")
   else()
 set(_GTEST_IMPORTED_TYPE IMPORTED_LOCATION)
 set(_GTEST_LIBRARY_SUFFIX
 "${CMAKE_GTEST_DEBUG_EXTENSION}${CMAKE_SHARED_LIBRARY_SUFFIX}")
 
-# Library and runtime same on non-Windows
-set(_GTEST_LIBRARY_DIR "${_GTEST_RUNTIME_DIR}")
   endif()
 
   set(GTEST_SHARED_LIB
@@ -1630,38 +1626,16 @@ macro(build_gtest)
 )
   set(GTEST_CMAKE_ARGS
   ${EP_COMMON_TOOLCHAIN}
-  -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}

[arrow] branch master updated: ARROW-10820: [Rust] [DataFusion] Complete TPC-H Benchmark Queries

2020-12-06 Thread agrove
This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new bce15dc  ARROW-10820: [Rust] [DataFusion] Complete TPC-H Benchmark 
Queries
bce15dc is described below

commit bce15dcc743ea40b2c627a7c3ab9454f8f4b9d76
Author: Mike Seddon 
AuthorDate: Sun Dec 6 15:33:31 2020 -0700

ARROW-10820: [Rust] [DataFusion] Complete TPC-H Benchmark Queries

Changes:
- can now execute queries 3, 5 and 6 (in addition to 1, 12).
- add all tables.
- add all queries with query validation parameters provided.
- make basic modifications to queries 5 and 6 to allow supported syntax.
- queries 1 and 12 retain existing modifications.

this was an easy way to begin understanding some of the code base and get 
involved.

Closes #8845 from seddonm1/complete-tpch-queries

Authored-by: Mike Seddon 
Signed-off-by: Andy Grove 
---
 rust/benchmarks/src/bin/tpch.rs | 918 +---
 1 file changed, 856 insertions(+), 62 deletions(-)

diff --git a/rust/benchmarks/src/bin/tpch.rs b/rust/benchmarks/src/bin/tpch.rs
index 82edd98..fd71022 100644
--- a/rust/benchmarks/src/bin/tpch.rs
+++ b/rust/benchmarks/src/bin/tpch.rs
@@ -89,7 +89,9 @@ enum TpchOpt {
 Convert(ConvertOpt),
 }
 
-const TABLES: &[] = &["lineitem", "orders"];
+const TABLES: &[] = &[
+"part", "supplier", "partsupp", "customer", "orders", "lineitem", 
"nation", "region",
+];
 
 #[tokio::main]
 async fn main() -> Result<()> {
@@ -145,59 +147,797 @@ async fn benchmark(opt: BenchmarkOpt) -> Result<()> {
 
 fn create_logical_plan(ctx:  ExecutionContext, query: usize) -> 
Result {
 match query {
+
+// original
+// 1 => ctx.create_logical_plan(
+// "select
+// l_returnflag,
+// l_linestatus,
+// sum(l_quantity) as sum_qty,
+// sum(l_extendedprice) as sum_base_price,
+// sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
+// sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as 
sum_charge,
+// avg(l_quantity) as avg_qty,
+// avg(l_extendedprice) as avg_price,
+// avg(l_discount) as avg_disc,
+// count(*) as count_order
+// from
+// lineitem
+// where
+// l_shipdate <= date '1998-12-01' - interval '90' day (3)
+// group by
+// l_returnflag,
+// l_linestatus
+// order by
+// l_returnflag,
+// l_linestatus;"
+// ),
 1 => ctx.create_logical_plan(
 "select
-l_returnflag,
-l_linestatus,
-sum(l_quantity),
-sum(l_extendedprice),
-sum(l_extendedprice * (1 - l_discount)),
-sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
-avg(l_quantity),
-avg(l_extendedprice),
-avg(l_discount),
-count(*)
-from
-lineitem
-where
-l_shipdate <= '1998-12-01'
-group by
-l_returnflag,
-l_linestatus
-order by
-l_returnflag,
-l_linestatus",
+l_returnflag,
+l_linestatus,
+sum(l_quantity) as sum_qty,
+sum(l_extendedprice) as sum_base_price,
+sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
+sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as 
sum_charge,
+avg(l_quantity) as avg_qty,
+avg(l_extendedprice) as avg_price,
+avg(l_discount) as avg_disc,
+count(*) as count_order
+from
+lineitem
+where
+l_shipdate <= '1998-09-02'
+group by
+l_returnflag,
+l_linestatus
+order by
+l_returnflag,
+l_linestatus;",
+),
+
+2 => ctx.create_logical_plan(
+"select
+s_acctbal,
+s_name,
+n_name,
+p_partkey,
+p_mfgr,
+s_address,
+s_phone,
+s_comment
+from
+part,
+supplier,
+partsupp,
+nation,
+region
+where
+p_partkey = ps_partkey
+and s_suppkey = ps_suppkey
+and p_size = 15
+and 

[arrow] branch master updated (d1340a3 -> e75e0fb)

2020-12-06 Thread agrove
This is an automated email from the ASF dual-hosted git repository.

agrove pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from d1340a3  ARROW-10824: [Rust] Added partialEq to null array
 add e75e0fb  ARROW-10813: [Rust] [DataFusion] Implement DFSchema

No new revisions were added by this update.

Summary of changes:
 rust/datafusion/src/logical_plan/dfschema.rs | 418 +++
 rust/datafusion/src/logical_plan/mod.rs  |   2 +
 2 files changed, 420 insertions(+)
 create mode 100644 rust/datafusion/src/logical_plan/dfschema.rs



[arrow] branch master updated (1727b10 -> d1340a3)

2020-12-06 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 1727b10  ARROW-10823: [Rust] Fixed error in MutableArrayData
 add d1340a3  ARROW-10824: [Rust] Added partialEq to null array

No new revisions were added by this update.

Summary of changes:
 rust/arrow/src/array/equal/mod.rs | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)



[arrow] branch master updated: ARROW-10823: [Rust] Fixed error in MutableArrayData

2020-12-06 Thread alamb
This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 1727b10  ARROW-10823: [Rust] Fixed error in MutableArrayData
1727b10 is described below

commit 1727b102a5b9dcba60feb58a005bb389dfdbe2a9
Author: Jorge C. Leitao 
AuthorDate: Sun Dec 6 06:49:43 2020 -0500

ARROW-10823: [Rust] Fixed error in MutableArrayData

This fixes an error on `MutableArrayData` on which null bits were not being 
set when an array had no nulls, but other arrays had nulls, causing a semantic 
error in the final array.

Closes #8848 from jorgecarleitao/fix_error

Authored-by: Jorge C. Leitao 
Signed-off-by: Andrew Lamb 
---
 rust/arrow/src/array/transform/mod.rs | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/rust/arrow/src/array/transform/mod.rs 
b/rust/arrow/src/array/transform/mod.rs
index 9c4149e..074d6ac 100644
--- a/rust/arrow/src/array/transform/mod.rs
+++ b/rust/arrow/src/array/transform/mod.rs
@@ -285,10 +285,16 @@ impl<'a> MutableArrayData<'a> {
 /// `use_nulls` is a flag used to optimize insertions. It should be 
`false` if the only source of nulls
 /// are the arrays themselves and `true` if the user plans to call 
[MutableArrayData::extend_nulls].
 /// In other words, if `use_nulls` is `false`, calling 
[MutableArrayData::extend_nulls] should not be used.
-pub fn new(arrays: Vec<&'a ArrayData>, use_nulls: bool, capacity: usize) 
-> Self {
+pub fn new(arrays: Vec<&'a ArrayData>, mut use_nulls: bool, capacity: 
usize) -> Self {
 let data_type = arrays[0].data_type();
 use crate::datatypes::*;
 
+// if any of the arrays has nulls, insertions from any array requires 
setting bits
+// as there is at least one array with nulls.
+if arrays.iter().any(|array| array.null_count() > 0) {
+use_nulls = true;
+};
+
 let buffers = match _type {
 DataType::Boolean => {
 let bytes = bit_util::ceil(capacity, 8);
@@ -615,6 +621,26 @@ mod tests {
 }
 
 #[test]
+fn test_multiple_with_nulls() {
+let array1 = StringArray::from(vec!["hello", "world"]).data();
+let array2 = StringArray::from(vec![Some("1"), None]).data();
+
+let arrays = vec![array1.as_ref(), array2.as_ref()];
+
+let mut mutable = MutableArrayData::new(arrays, false, 5);
+
+mutable.extend(0, 0, 2);
+mutable.extend(1, 0, 2);
+
+let result = mutable.freeze();
+let result = StringArray::from(Arc::new(result));
+
+let expected =
+StringArray::from(vec![Some("hello"), Some("world"), Some("1"), 
None]);
+assert_eq!(result, expected);
+}
+
+#[test]
 fn test_string_null_offset_nulls() {
 let array =
 StringArray::from(vec![Some("a"), Some("bc"), None, 
Some("defh")]).data();



[arrow] branch master updated: ARROW-10822 [Rust][Datafusion] add simd feature flag to datafusion

2020-12-06 Thread jorgecarleitao
This is an automated email from the ASF dual-hosted git repository.

jorgecarleitao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new ee83d47  ARROW-10822 [Rust][Datafusion] add simd feature flag to 
datafusion
ee83d47 is described below

commit ee83d4789038f3d46b32f3c6df9a5b13d2707d25
Author: Qingping Hou 
AuthorDate: Sun Dec 6 09:05:47 2020 +0100

ARROW-10822 [Rust][Datafusion] add simd feature flag to datafusion

allow building datafusion with simd enabled

Closes #8847 from houqp/qp_simd

Authored-by: Qingping Hou 
Signed-off-by: Jorge C. Leitao 
---
 rust/datafusion/Cargo.toml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rust/datafusion/Cargo.toml b/rust/datafusion/Cargo.toml
index bd2ae97..c969512 100644
--- a/rust/datafusion/Cargo.toml
+++ b/rust/datafusion/Cargo.toml
@@ -42,6 +42,7 @@ path = "src/bin/main.rs"
 [features]
 default = ["cli"]
 cli = ["rustyline"]
+simd = ["arrow/simd"]
 
 [dependencies]
 ahash = "0.6"