[jira] [Created] (ARROW-1576) [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1576:
---

 Summary: [Python] Add utility functions (or a richer type 
hierachy) for checking whether data type instances are members of various type 
classes
 Key: ARROW-1576
 URL: https://issues.apache.org/jira/browse/ARROW-1576
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.8.0


E.g. {{is_integer}}, {{is_unsigned_integer}}. This could be implemented similar 
to NumPy, too ({{isinstance(t, pa.FloatingPoint)}} or something)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1575) [Python] Add pyarrow.column factory function

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1575:
---

 Summary: [Python] Add pyarrow.column factory function
 Key: ARROW-1575
 URL: https://issues.apache.org/jira/browse/ARROW-1575
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.8.0


This would internally call {{Column.from_array}} as appropriate



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1570) [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1570:
---

 Summary: [C++] Define API for creating a kernel instance from 
function of scalar input and output with a particular signature
 Key: ARROW-1570
 URL: https://issues.apache.org/jira/browse/ARROW-1570
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


This could include an {{std::function}} instance (but these cannot be inlined 
by the C++ compiler), but should also permit use with inline-able functions or 
functors



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1569) [C++] Kernel functions for determining monotonicity (ascending or descending) for well-ordered types

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1569:
---

 Summary: [C++] Kernel functions for determining monotonicity 
(ascending or descending) for well-ordered types
 Key: ARROW-1569
 URL: https://issues.apache.org/jira/browse/ARROW-1569
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


These kernels must offer some stateful variant so that monotonicity can be 
determined across chunked arrays



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1568) [C++] Implement "drop null" kernels that return array without nulls

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1568:
---

 Summary: [C++] Implement "drop null" kernels that return array 
without nulls
 Key: ARROW-1568
 URL: https://issues.apache.org/jira/browse/ARROW-1568
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1567) [C++] Implement "fill null" kernels that replace null values with some scalar replacement value

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1567:
---

 Summary: [C++] Implement "fill null" kernels that replace null 
values with some scalar replacement value
 Key: ARROW-1567
 URL: https://issues.apache.org/jira/browse/ARROW-1567
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1566) [C++] Implement "argsort" kernels that use mergesort to compute sorting indices

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1566:
---

 Summary: [C++] Implement "argsort" kernels that use mergesort to 
compute sorting indices
 Key: ARROW-1566
 URL: https://issues.apache.org/jira/browse/ARROW-1566
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1565) [C++] "argtopk" and "argbottomk" functions for computing indices of largest or smallest elements

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1565:
---

 Summary: [C++] "argtopk" and "argbottomk" functions for computing 
indices of largest or smallest elements
 Key: ARROW-1565
 URL: https://issues.apache.org/jira/browse/ARROW-1565
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


Heap-based topk can compute these indices in O(n log k) time



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1564:
---

 Summary: [C++] Kernel functions for computing minimum and maximum 
of an array in one pass
 Key: ARROW-1564
 URL: https://issues.apache.org/jira/browse/ARROW-1564
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1563) [C++] Implement logical unary and binary kernels for boolean arrays

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1563:
---

 Summary: [C++] Implement logical unary and binary kernels for 
boolean arrays
 Key: ARROW-1563
 URL: https://issues.apache.org/jira/browse/ARROW-1563
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


And, or, not (negate), xor



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1562) [C++] Numeric kernel implementations for add (+)

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1562:
---

 Summary: [C++] Numeric kernel implementations for add (+)
 Key: ARROW-1562
 URL: https://issues.apache.org/jira/browse/ARROW-1562
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


This function should respect consistent type promotions between types of 
different sizes and signed and unsigned integers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1559:
---

 Summary: [C++] Kernel implementations for "unique" (compute 
distinct elements of array)
 Key: ARROW-1559
 URL: https://issues.apache.org/jira/browse/ARROW-1559
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1558) [C++] Implement boolean selection kernels

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1558:
---

 Summary: [C++] Implement boolean selection kernels
 Key: ARROW-1558
 URL: https://issues.apache.org/jira/browse/ARROW-1558
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


Select values where a boolean selection array is true. If any values in are 
null, then values in the output array should be null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1557) pyarrow.Table.from_arrays doesn't validate names length

2017-09-19 Thread Tom Augspurger (JIRA)
Tom Augspurger created ARROW-1557:
-

 Summary: pyarrow.Table.from_arrays doesn't validate names length
 Key: ARROW-1557
 URL: https://issues.apache.org/jira/browse/ARROW-1557
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.7.0
Reporter: Tom Augspurger
Priority: Minor


pa.Table.from_arrays doesn't validate that the length of {{arrays}} and 
{{names}} matches. I think this should raise with a {{ValueError}}:

{{
In [1]: import pyarrow as pa

In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 
'b', 'c'])
Out[2]:
pyarrow.Table
a: int64
b: int64

In [3]: pa.__version__
Out[3]: '0.7.0'
}}

(This is my first time using JIRA, hopefully I didn't mess up too badly)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1556) [C++] Incorporate AssertArraysEqual function from PARQUET-1100 patch

2017-09-19 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1556:
---

 Summary: [C++] Incorporate AssertArraysEqual function from 
PARQUET-1100 patch
 Key: ARROW-1556
 URL: https://issues.apache.org/jira/browse/ARROW-1556
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.8.0


see discussion in https://github.com/apache/parquet-cpp/pull/398



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1555) PyArrow write_to_dataset on s3

2017-09-19 Thread Young-Jun Ko (JIRA)
Young-Jun Ko created ARROW-1555:
---

 Summary: PyArrow write_to_dataset on s3
 Key: ARROW-1555
 URL: https://issues.apache.org/jira/browse/ARROW-1555
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Young-Jun Ko
Priority: Trivial


When writing a arrow table to s3, I get an NotImplemented Exception.
The root cause is in _ensure_filesystem and can be reproduced as follows:

import pyarrow
import pyarrow.parquet as pqa
import s3fs

s3 = s3fs.S3FileSystem()

pqa._ensure_filesystem(s3).exists("anything")

It appears that the S3FSWrapper that is instantiated in _ensure_filesystem does 
not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10

2017-09-19 Thread Dima Ryazanov (JIRA)
Dima Ryazanov created ARROW-1554:


 Summary: "ImportError: DLL load failed: The specified module could 
not be found" on Windows 10
 Key: ARROW-1554
 URL: https://issues.apache.org/jira/browse/ARROW-1554
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.7.0, 0.6.0, 0.5.0
 Environment: Windows 10 (x64)
Python 3.6.2 (x64)
Reporter: Dima Ryazanov


I just tried pyarrow on Windows 10, and it fails to import for me:

>>> import pyarrow
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", line 
32, in 
from pyarrow.lib import cpu_count, set_cpu_count
ImportError: DLL load failed: The specified module could not be found.

Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder:

C:\Users\dima\Documents>dir "C:\Program 
Files\Python36\lib\site-packages\pyarrow\"
 Volume in drive C has no label.
 Volume Serial Number is 4CE9-CC3C

 Directory of C:\Program Files\Python36\lib\site-packages\pyarrow

09/19/2017  01:14 AM  .
09/19/2017  01:14 AM  ..
09/19/2017  01:14 AM 2,382,336 arrow.dll
09/19/2017  01:14 AM   604,160 arrow_python.dll
09/19/2017  01:14 AM 3,402 compat.py
...

However, I cannot open them using ctypes.cdll. I wonder if some dependency is 
missing?

>>> open('C:\\Program 
>>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb')
<_io.BufferedReader name='C:\\Program 
Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'>
>>>
>>> cdll.LoadLibrary('C:\\Program 
>>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll')
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in 
LoadLibrary
return self._dlltype(name)
  File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)