[jira] [Created] (ARROW-1576) [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes
Wes McKinney created ARROW-1576: --- Summary: [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes Key: ARROW-1576 URL: https://issues.apache.org/jira/browse/ARROW-1576 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Wes McKinney Fix For: 0.8.0 E.g. {{is_integer}}, {{is_unsigned_integer}}. This could be implemented similar to NumPy, too ({{isinstance(t, pa.FloatingPoint)}} or something) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1575) [Python] Add pyarrow.column factory function
Wes McKinney created ARROW-1575: --- Summary: [Python] Add pyarrow.column factory function Key: ARROW-1575 URL: https://issues.apache.org/jira/browse/ARROW-1575 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Wes McKinney Fix For: 0.8.0 This would internally call {{Column.from_array}} as appropriate -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1570) [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature
Wes McKinney created ARROW-1570: --- Summary: [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature Key: ARROW-1570 URL: https://issues.apache.org/jira/browse/ARROW-1570 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney This could include an {{std::function}} instance (but these cannot be inlined by the C++ compiler), but should also permit use with inline-able functions or functors -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1569) [C++] Kernel functions for determining monotonicity (ascending or descending) for well-ordered types
Wes McKinney created ARROW-1569: --- Summary: [C++] Kernel functions for determining monotonicity (ascending or descending) for well-ordered types Key: ARROW-1569 URL: https://issues.apache.org/jira/browse/ARROW-1569 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney These kernels must offer some stateful variant so that monotonicity can be determined across chunked arrays -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1568) [C++] Implement "drop null" kernels that return array without nulls
Wes McKinney created ARROW-1568: --- Summary: [C++] Implement "drop null" kernels that return array without nulls Key: ARROW-1568 URL: https://issues.apache.org/jira/browse/ARROW-1568 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1567) [C++] Implement "fill null" kernels that replace null values with some scalar replacement value
Wes McKinney created ARROW-1567: --- Summary: [C++] Implement "fill null" kernels that replace null values with some scalar replacement value Key: ARROW-1567 URL: https://issues.apache.org/jira/browse/ARROW-1567 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1566) [C++] Implement "argsort" kernels that use mergesort to compute sorting indices
Wes McKinney created ARROW-1566: --- Summary: [C++] Implement "argsort" kernels that use mergesort to compute sorting indices Key: ARROW-1566 URL: https://issues.apache.org/jira/browse/ARROW-1566 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1565) [C++] "argtopk" and "argbottomk" functions for computing indices of largest or smallest elements
Wes McKinney created ARROW-1565: --- Summary: [C++] "argtopk" and "argbottomk" functions for computing indices of largest or smallest elements Key: ARROW-1565 URL: https://issues.apache.org/jira/browse/ARROW-1565 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Heap-based topk can compute these indices in O(n log k) time -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass
Wes McKinney created ARROW-1564: --- Summary: [C++] Kernel functions for computing minimum and maximum of an array in one pass Key: ARROW-1564 URL: https://issues.apache.org/jira/browse/ARROW-1564 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1563) [C++] Implement logical unary and binary kernels for boolean arrays
Wes McKinney created ARROW-1563: --- Summary: [C++] Implement logical unary and binary kernels for boolean arrays Key: ARROW-1563 URL: https://issues.apache.org/jira/browse/ARROW-1563 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney And, or, not (negate), xor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1562) [C++] Numeric kernel implementations for add (+)
Wes McKinney created ARROW-1562: --- Summary: [C++] Numeric kernel implementations for add (+) Key: ARROW-1562 URL: https://issues.apache.org/jira/browse/ARROW-1562 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney This function should respect consistent type promotions between types of different sizes and signed and unsigned integers -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1559) [C++] Kernel implementations for "unique" (compute distinct elements of array)
Wes McKinney created ARROW-1559: --- Summary: [C++] Kernel implementations for "unique" (compute distinct elements of array) Key: ARROW-1559 URL: https://issues.apache.org/jira/browse/ARROW-1559 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1558) [C++] Implement boolean selection kernels
Wes McKinney created ARROW-1558: --- Summary: [C++] Implement boolean selection kernels Key: ARROW-1558 URL: https://issues.apache.org/jira/browse/ARROW-1558 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Select values where a boolean selection array is true. If any values in are null, then values in the output array should be null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1557) pyarrow.Table.from_arrays doesn't validate names length
Tom Augspurger created ARROW-1557: - Summary: pyarrow.Table.from_arrays doesn't validate names length Key: ARROW-1557 URL: https://issues.apache.org/jira/browse/ARROW-1557 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.7.0 Reporter: Tom Augspurger Priority: Minor pa.Table.from_arrays doesn't validate that the length of {{arrays}} and {{names}} matches. I think this should raise with a {{ValueError}}: {{ In [1]: import pyarrow as pa In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], names=['a', 'b', 'c']) Out[2]: pyarrow.Table a: int64 b: int64 In [3]: pa.__version__ Out[3]: '0.7.0' }} (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1556) [C++] Incorporate AssertArraysEqual function from PARQUET-1100 patch
Wes McKinney created ARROW-1556: --- Summary: [C++] Incorporate AssertArraysEqual function from PARQUET-1100 patch Key: ARROW-1556 URL: https://issues.apache.org/jira/browse/ARROW-1556 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.8.0 see discussion in https://github.com/apache/parquet-cpp/pull/398 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1555) PyArrow write_to_dataset on s3
Young-Jun Ko created ARROW-1555: --- Summary: PyArrow write_to_dataset on s3 Key: ARROW-1555 URL: https://issues.apache.org/jira/browse/ARROW-1555 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.7.0 Reporter: Young-Jun Ko Priority: Trivial When writing a arrow table to s3, I get an NotImplemented Exception. The root cause is in _ensure_filesystem and can be reproduced as follows: import pyarrow import pyarrow.parquet as pqa import s3fs s3 = s3fs.S3FileSystem() pqa._ensure_filesystem(s3).exists("anything") It appears that the S3FSWrapper that is instantiated in _ensure_filesystem does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1554) "ImportError: DLL load failed: The specified module could not be found" on Windows 10
Dima Ryazanov created ARROW-1554: Summary: "ImportError: DLL load failed: The specified module could not be found" on Windows 10 Key: ARROW-1554 URL: https://issues.apache.org/jira/browse/ARROW-1554 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.7.0, 0.6.0, 0.5.0 Environment: Windows 10 (x64) Python 3.6.2 (x64) Reporter: Dima Ryazanov I just tried pyarrow on Windows 10, and it fails to import for me: >>> import pyarrow Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python36\lib\site-packages\pyarrow\__init__.py", line 32, in from pyarrow.lib import cpu_count, set_cpu_count ImportError: DLL load failed: The specified module could not be found. Not sure which DLL is failing, but I do see some DLLs in the pyarrow folder: C:\Users\dima\Documents>dir "C:\Program Files\Python36\lib\site-packages\pyarrow\" Volume in drive C has no label. Volume Serial Number is 4CE9-CC3C Directory of C:\Program Files\Python36\lib\site-packages\pyarrow 09/19/2017 01:14 AM . 09/19/2017 01:14 AM .. 09/19/2017 01:14 AM 2,382,336 arrow.dll 09/19/2017 01:14 AM 604,160 arrow_python.dll 09/19/2017 01:14 AM 3,402 compat.py ... However, I cannot open them using ctypes.cdll. I wonder if some dependency is missing? >>> open('C:\\Program >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll', 'rb') <_io.BufferedReader name='C:\\Program Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll'> >>> >>> cdll.LoadLibrary('C:\\Program >>> Files\\Python36\\Lib\\site-packages\\pyarrow\\parquet.dll') Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 426, in LoadLibrary return self._dlltype(name) File "C:\Program Files\Python36\lib\ctypes\__init__.py", line 348, in __init__ self._handle = _dlopen(self._name, mode) OSError: [WinError 126] The specified module could not be found -- This message was sent by Atlassian JIRA (v6.4.14#64029)