[jira] [Created] (ARROW-12501) [CI][Ruby] Remove needless workaround for MinGW build

2021-04-21 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-12501:


 Summary: [CI][Ruby] Remove needless workaround for MinGW build
 Key: ARROW-12501
 URL: https://issues.apache.org/jira/browse/ARROW-12501
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Ruby
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12500) [C++][Dataset] Consolidate similar tests for file formats

2021-04-21 Thread David Li (Jira)
David Li created ARROW-12500:


 Summary: [C++][Dataset] Consolidate similar tests for file formats
 Key: ARROW-12500
 URL: https://issues.apache.org/jira/browse/ARROW-12500
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li
Assignee: David Li
 Fix For: 5.0.0


Between CSV/Parquet/IPC we have a number of very similar or in some cases 
essentially identical tests. As we're doing more refactoring and development it 
would be nice to consolidate these tests so that we can ensure all formats 
behave consistently and get the same level of testing. For instance, 
ARROW-11772 now adds more comprehensive tests for scanning IPC which don't yet 
apply to Parquet/CSV.

This sort of consolidation may also be nice to do in Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12499) [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-04-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-12499:
--

 Summary: [C++][Compute] Add ScalarAggregateOptions to Any and All 
kernels
 Key: ARROW-12499
 URL: https://issues.apache.org/jira/browse/ARROW-12499
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python, R
Reporter: Rok Mihevc
Assignee: Rok Mihevc


Follow up to ARROW-9054 and ARROW-12185 - see 
[comment|https://github.com/apache/arrow/pull/10032#pullrequestreview-641468079].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12498) cannot bind 'std::unique_ptr'

2021-04-21 Thread Jira
Mauricio 'Pachá' Vargas Sepúlveda created ARROW-12498:
-

 Summary: cannot bind 
'std::unique_ptr' 
 Key: ARROW-12498
 URL: https://issues.apache.org/jira/browse/ARROW-12498
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Mauricio 'Pachá' Vargas Sepúlveda


centos-7-amd64 has repeatedly failed at nightly builds:

2021-04-21: 
[https://github.com/ursacomputing/crossbow/runs/2397521625#step:8:902]

2021-04-20: 
[https://github.com/ursacomputing/crossbow/runs/2387780946#step:8:901]

the error reads:
{code:java}
/root/rpmbuild/BUILD/apache-arrow-3.1.0.dev705/cpp/src/arrow/adapters/orc/adapter.cc:581:10:
 error: cannot bind 'std::unique_ptr' 
lvalue to 'std::unique_ptr&&'{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12497) [C++] Implement array expression from R in C++

2021-04-21 Thread Nic Crane (Jira)
Nic Crane created ARROW-12497:
-

 Summary: [C++] Implement array expression from R in C++
 Key: ARROW-12497
 URL: https://issues.apache.org/jira/browse/ARROW-12497
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Nic Crane


As discussed here: 
https://github.com/apache/arrow/pull/10056#discussion_r616985185

Currently, the R implementation allows for array expressions to be built which 
are later evaluated within a single call to Filter rather than in multiple 
operations.  This functionality should be moved to the C++ level so it's dealt 
with at a lower level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12496) [C++][Dataset] Ensure Scanner tests fully cover async

2021-04-21 Thread David Li (Jira)
David Li created ARROW-12496:


 Summary: [C++][Dataset] Ensure Scanner tests fully cover async
 Key: ARROW-12496
 URL: https://issues.apache.org/jira/browse/ARROW-12496
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: David Li
Assignee: David Li


Some of the tests for scanners don't fully cover the async scanner as they scan 
a single fragment, which isn't supported by AsyncScanner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12495) [C++][Python] NumPy buffer sets is_mutable_ to true but does not set mutable_data_ when the NumPy array is writable

2021-04-21 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-12495:


 Summary: [C++][Python] NumPy buffer sets is_mutable_ to true but 
does not set mutable_data_ when the NumPy array is writable
 Key: ARROW-12495
 URL: https://issues.apache.org/jira/browse/ARROW-12495
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Wes McKinney
 Fix For: 4.0.0


Bug is evident

{code}
NumPyBuffer::NumPyBuffer(PyObject* ao) : Buffer(nullptr, 0) {
  PyAcquireGIL lock;
  arr_ = ao;
  Py_INCREF(ao);

  if (PyArray_Check(ao)) {
PyArrayObject* ndarray = reinterpret_cast(ao);
data_ = reinterpret_cast(PyArray_DATA(ndarray));
size_ = PyArray_SIZE(ndarray) * PyArray_DESCR(ndarray)->elsize;
capacity_ = size_;

if (PyArray_FLAGS(ndarray) & NPY_ARRAY_WRITEABLE) {
  is_mutable_ = true;
}
  }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12494) [C++] ORC adapter fails to compile on GCC 4.8

2021-04-21 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-12494:
---

 Summary: [C++] ORC adapter fails to compile on GCC 4.8 
 Key: ARROW-12494
 URL: https://issues.apache.org/jira/browse/ARROW-12494
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 4.0.0


Centos 7 packaging build failed during the release 
https://github.com/ursacomputing/crossbow/runs/2400255864



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12493) Support DictionaryArray in CSV and JSON formatters

2021-04-21 Thread Raphael Taylor-Davies (Jira)
Raphael Taylor-Davies created ARROW-12493:
-

 Summary: Support DictionaryArray in CSV and JSON formatters
 Key: ARROW-12493
 URL: https://issues.apache.org/jira/browse/ARROW-12493
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Raphael Taylor-Davies
Assignee: Raphael Taylor-Davies


Currently the CSV and JSON formatters do not support JSON and CSV arrays



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12492) [Python] Add an helper method to decode a DictionaryArray back to a plain Array

2021-04-21 Thread Alessandro Molina (Jira)
Alessandro Molina created ARROW-12492:
-

 Summary: [Python] Add an helper method to decode a DictionaryArray 
back to a plain Array
 Key: ARROW-12492
 URL: https://issues.apache.org/jira/browse/ARROW-12492
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Alessandro Molina


To create a DictionaryArray pyarrow currently offers the 
{{Array.dictionary_encode}} helper, but there is a lack of an obvious way to do 
the reverse. A {{DictionaryArray.decode}} helper could provide an immediate 
obvious solution to get back an unrolled Array from the dictionary encoded 
version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)