[jira] [Created] (ARROW-11663) [DataFusion] Master does not compile

2021-02-16 Thread Jira
Jorge Leitão created ARROW-11663:


 Summary: [DataFusion] Master does not compile
 Key: ARROW-11663
 URL: https://issues.apache.org/jira/browse/ARROW-11663
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11662) [C++] Support sorting for decimal data type.

2021-02-16 Thread ZMZ91 (Jira)
ZMZ91 created ARROW-11662:
-

 Summary: [C++] Support sorting for decimal data type.
 Key: ARROW-11662
 URL: https://issues.apache.org/jira/browse/ARROW-11662
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: ZMZ91


Seems sorting on decimal datum is not implemented yet til 3.0.0.

Is there special reason not support it?

Is the implementation on roadmap?

Any workaround to sort on decimal datum so far?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11661) [C++] Compilation failure in arrow/scalar.cc on Xcode 8.3.3

2021-02-16 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-11661:


 Summary: [C++] Compilation failure in arrow/scalar.cc on Xcode 
8.3.3 
 Key: ARROW-11661
 URL: https://issues.apache.org/jira/browse/ARROW-11661
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney


See https://gist.github.com/wesm/e3b52381de1556f2af669c7e2458afd0

It seems that this template construct is not supported so robustly across older 
compilers:

{code}
// timestamp to string
Status CastImpl(const TimestampScalar& from, StringScalar* to) {
  to->value = FormatToBuffer(internal::StringFormatter{}, from);
  return Status::OK();
}

// date to string
template 
Status CastImpl(const DateScalar& from, StringScalar* to) {
  TimestampScalar ts({}, timestamp(TimeUnit::MILLI));
  RETURN_NOT_OK(CastImpl(from, ));
  return CastImpl(ts, to);
}

// string to any
template 
Status CastImpl(const StringScalar& from, ScalarType* to) {
  ARROW_ASSIGN_OR_RAISE(auto out,
Scalar::Parse(to->type, 
util::string_view(*from.value)));
  to->value = std::move(checked_cast(*out).value);
  return Status::OK();
}

// binary to string
Status CastImpl(const BinaryScalar& from, StringScalar* to) {
  to->value = from.value;
  return Status::OK();
}

// formattable to string
template ,
  // note: Value unused but necessary to trigger SFINAE if Formatter is
  // undefined
  typename Value = typename Formatter::value_type>
Status CastImpl(const ScalarType& from, StringScalar* to) {
  to->value = FormatToBuffer(Formatter{from.type}, from);
  return Status::OK();
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11660) [C++] Move RecordBatch::SelectColumns method from R to C++ library

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11660:
---

 Summary: [C++] Move RecordBatch::SelectColumns method from R to 
C++ library
 Key: ARROW-11660
 URL: https://issues.apache.org/jira/browse/ARROW-11660
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, R
Reporter: Neal Richardson
 Fix For: 4.0.0


Table has a proper SelectColumns method in the C++ library but the RecordBatch 
one is in the R library and should be pushed down to C++



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11659) [R] Preserve group_by .drop argument

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11659:
---

 Summary: [R] Preserve group_by .drop argument
 Key: ARROW-11659
 URL: https://issues.apache.org/jira/browse/ARROW-11659
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11658) [R] Handle mutate/rename inside group_by

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11658:
---

 Summary: [R] Handle mutate/rename inside group_by
 Key: ARROW-11658
 URL: https://issues.apache.org/jira/browse/ARROW-11658
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Followup to ARROW-11657



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11657) [R] group_by with .drop specified errors

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11657:
---

 Summary: [R] group_by with .drop specified errors
 Key: ARROW-11657
 URL: https://issues.apache.org/jira/browse/ARROW-11657
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


cf. https://github.com/tidyverse/dplyr/issues/5763



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11655) Pad/trim functions

2021-02-16 Thread Mike Seddon (Jira)
Mike Seddon created ARROW-11655:
---

 Summary: Pad/trim functions
 Key: ARROW-11655
 URL: https://issues.apache.org/jira/browse/ARROW-11655
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Mike Seddon
Assignee: Mike Seddon


The Pad and Trimming functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11656) Left over functions/fixes

2021-02-16 Thread Mike Seddon (Jira)
Mike Seddon created ARROW-11656:
---

 Summary: Left over functions/fixes
 Key: ARROW-11656
 URL: https://issues.apache.org/jira/browse/ARROW-11656
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Mike Seddon
Assignee: Mike Seddon






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11654) Regex functions

2021-02-16 Thread Mike Seddon (Jira)
Mike Seddon created ARROW-11654:
---

 Summary: Regex functions
 Key: ARROW-11654
 URL: https://issues.apache.org/jira/browse/ARROW-11654
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Mike Seddon
Assignee: Mike Seddon


The regexp Postgres functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11653) Ascii/unicode functions

2021-02-16 Thread Mike Seddon (Jira)
Mike Seddon created ARROW-11653:
---

 Summary: Ascii/unicode functions
 Key: ARROW-11653
 URL: https://issues.apache.org/jira/browse/ARROW-11653
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Mike Seddon


Implement the Postgres Ascii/Unicode functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11652) Signature::OneOf

2021-02-16 Thread Mike Seddon (Jira)
Mike Seddon created ARROW-11652:
---

 Summary: Signature::OneOf
 Key: ARROW-11652
 URL: https://issues.apache.org/jira/browse/ARROW-11652
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Mike Seddon
Assignee: Mike Seddon


There needs to be a way of defining a function signature that supports multiple 
strict options:

e.g. `lpad`
[string, int] or [string, int, string]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11651) Postgres Length Functions

2021-02-16 Thread Mike Seddon (Jira)
Mike Seddon created ARROW-11651:
---

 Summary: Postgres Length Functions
 Key: ARROW-11651
 URL: https://issues.apache.org/jira/browse/ARROW-11651
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Mike Seddon
Assignee: Mike Seddon


To break up the large PR this is just the Postgres length functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11650) [Rust][DataFusion] Add Postgres License

2021-02-16 Thread Mike Seddon (Jira)
Mike Seddon created ARROW-11650:
---

 Summary: [Rust][DataFusion] Add Postgres License
 Key: ARROW-11650
 URL: https://issues.apache.org/jira/browse/ARROW-11650
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Mike Seddon
Assignee: Mike Seddon


DataFusion aims to support the PostgreSQL compatibility. To achieve 
compatibility
parts of the DataFusion code base may have reproduced code and documentation 
from the
PostgreSQL project and needs the license to reflect this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11649) Add support for null_fallback to R

2021-02-16 Thread Weston Pace (Jira)
Weston Pace created ARROW-11649:
---

 Summary: Add support for null_fallback to R
 Key: ARROW-11649
 URL: https://issues.apache.org/jira/browse/ARROW-11649
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Weston Pace
Assignee: Weston Pace


ARROW-10438 made is so that a "null" in a partition column will be mapped to a 
special directory with hive partitioning.  By default this is 
__HIVE_DEFAULT_PARTITION__ but it is configurable by other hive manipulation 
tools (e.g. spark) and so needs to be configurable in Arrow as well.  The 
`null_fallback` option is a string that can be passed to 
{color:#00}HivePartitioning::HivePartitioning and 
HivePartitioning::MakeFactory

This option should be exposed in R.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11648) [Rust]: Json reader for nested dictionary arrays returns empty array instead of array of nulls

2021-02-16 Thread Jira
Jörn Horstmann created ARROW-11648:
--

 Summary: [Rust]: Json reader for nested dictionary arrays returns 
empty array instead of array of nulls
 Key: ARROW-11648
 URL: https://issues.apache.org/jira/browse/ARROW-11648
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Jörn Horstmann


 
{code:java}
// schema: id: Utf8, attr: List>
let input = json!([
{"id": "a"},
{"id": "b"},
{"id": "c"},
{"id": "d"},
{"id": "e"},
]);

// Results in ArrowError("InvalidArgumentError(\"all columns in a record batch 
must have the same length\")"){code}
Probably related to `list_array_string_array_builder` around line 688:
{code:java}
if let Some(value) = row.get(col_name) { 
...
}
// no else{code}
 

Expected: The resulting array should have a length equal to the number of rows, 
all nested lists should be marked as null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11647) [C++][Compute] CastFromNull does not use preallocated buffers

2021-02-16 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-11647:


 Summary: [C++][Compute] CastFromNull does not use preallocated 
buffers
 Key: ARROW-11647
 URL: https://issues.apache.org/jira/browse/ARROW-11647
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 4.0.0


When casting from null, currently new buffers are allocated for every batch of 
the computation. This is wasteful as for simple types data buffers are 
preallocated and the null bitmap is handled separately; CastFromNull need do no 
work at all (unless we decide to explicitly zero the data buffer). For 
varlength out types the offsets buffer is preallocated and should be zeroed, 
for struct types preallocation is not implemented (but should be as simple as 
preallocating each child array).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11646) [Python] Allow to create BooleanArray from a list of 0 and 1

2021-02-16 Thread quentin lhoest (Jira)
quentin lhoest created ARROW-11646:
--

 Summary: [Python] Allow to create BooleanArray from a list of 0 
and 1
 Key: ARROW-11646
 URL: https://issues.apache.org/jira/browse/ARROW-11646
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 3.0.0
Reporter: quentin lhoest


Currently one can create a BooleanArray using these ways:

{code:python}
pa.array([False, True, True, False])
pa.array(np.array([0, 1, 1, 0]), type=pa.bool_())
pa.array([0, 1, 1, 0]).cast(pa.bool_())
{code}


But creating it this way fails:

{code:python}
pa.array([0, 1, 1, 0], type=pa.bool_())
{code}

---
ArrowInvalid  Traceback (most recent call last)
 in 
> 1 a = pa.array([0, 1, 1, 0], type=pa.bool_())

~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/array.pxi in 
pyarrow.lib.array()

~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/array.pxi in 
pyarrow.lib._sequence_to_array()

~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.pyarrow_internal_check_status()

~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/error.pxi in 
pyarrow.lib.check_status()

ArrowInvalid: Could not convert 0 with type int: tried to convert to boolean
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11645) [Rust] Interval is out of spec

2021-02-16 Thread Jira
Jorge Leitão created ARROW-11645:


 Summary: [Rust] Interval is out of spec
 Key: ARROW-11645
 URL: https://issues.apache.org/jira/browse/ARROW-11645
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Jorge Leitão


The DataType interval is currently implemented as a i32 or i64. However, that 
is incorrect and does not follow the spec.

See https://github.com/apache/arrow/blob/master/format/Schema.fbs




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11644) [Python][Parquet] Low-level Python API for Parquet decryption

2021-02-16 Thread Itamar Turner-Trauring (Jira)
Itamar Turner-Trauring created ARROW-11644:
--

 Summary: [Python][Parquet] Low-level Python API for Parquet 
decryption
 Key: ARROW-11644
 URL: https://issues.apache.org/jira/browse/ARROW-11644
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Itamar Turner-Trauring
Assignee: Itamar Turner-Trauring


This will cover decryption only, using the low-level crypto API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11643) [C++] protobuf_ep failure on Xcode 8.3.3 / Apple LLVM 8.1

2021-02-16 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-11643:


 Summary: [C++] protobuf_ep failure on Xcode 8.3.3 / Apple LLVM 8.1
 Key: ARROW-11643
 URL: https://issues.apache.org/jira/browse/ARROW-11643
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney


I randomly decided to see if we can still build and run on a pre-SSE4.2 machine 
(2009-era MacBook), but protobuf_ep fails with

{code}
FAILED: 
CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o
 
/Applications/Xcode8.3.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
  -DGOOGLE_PROTOBUF_CMAKE_BUILD -DHAVE_PTHREAD -DHAVE_ZLIB -I. 
-I/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src 
-Qunused-arguments -fcolor-diagnostics -O3 -DNDEBUG -O3 -DNDEBUG -fPIC  
-Qunused-arguments -fcolor-diagnostics -O3 -DNDEBUG -O3 -DNDEBUG -fPIC   
-std=c++11 -MD -MT 
CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o
 -MF 
CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o.d
 -o 
CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o
 -c 
/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc
In file included from 
/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc:80:
/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/map_field.h:332:37:
 error: constexpr constructor never produces a constant expression 
[-Winvalid-constexpr]
  explicit PROTOBUF_MAYBE_CONSTEXPR MapFieldBase(ConstantInitialized)
^
/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/map_field.h:335:9:
 note: non-literal type 'internal::WrappedMutex' cannot be used in a constant 
expression
mutex_(GOOGLE_PROTOBUF_LINKER_INITIALIZED),
^
1 error generated.
{code}

Since this appears to be a warning, perhaps it can be suppressed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11642) [C++] Incorrect preprocessor directive for Windows

2021-02-16 Thread Markus Silberstein Hont (Jira)
Markus Silberstein Hont created ARROW-11642:
---

 Summary: [C++] Incorrect preprocessor directive for Windows
 Key: ARROW-11642
 URL: https://issues.apache.org/jira/browse/ARROW-11642
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Markus Silberstein Hont


In the HDFS connector library, there are per-platform preprocessor directives 
to determine which libjvm and libhdfs binaries to use for file operations. This 
is currently (3.0.0) failing due to an incorrect preprocessor directive for 
Windows. The code is referring to "__WIN32" while the properly defined variable 
should be "_WIN32" (see 
https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-160)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11641) [CI]: Use docker buildkit's inline cache to reuse build cache across different hosts

2021-02-16 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-11641:
---

 Summary: [CI]: Use docker buildkit's inline cache to reuse build 
cache across different hosts
 Key: ARROW-11641
 URL: https://issues.apache.org/jira/browse/ARROW-11641
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs






--
This message was sent by Atlassian Jira
(v8.3.4#803005)