[jira] [Commented] (ARROW-4717) [C#] Consider exposing ValueTask instead of Task
[ https://issues.apache.org/jira/browse/ARROW-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825532#comment-16825532 ] Mani Gandham commented on ARROW-4717: - [~jthelin] Since this is a performance-focused project, using ValueTask seems like the right call. .NET Core 2.0 is already [end of life as of October 2018|https://dotnet.microsoft.com/platform/support/policy/dotnet-core] and .NET Core 2.1 is the current LTS release. > [C#] Consider exposing ValueTask instead of Task > > > Key: ARROW-4717 > URL: https://issues.apache.org/jira/browse/ARROW-4717 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Reporter: Eric Erhardt >Assignee: Eric Erhardt >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > See [https://github.com/apache/arrow/pull/3736#pullrequestreview-207169204] > for the discussion and > [https://devblogs.microsoft.com/dotnet/understanding-the-whys-whats-and-whens-of-valuetask/] > for the reasoning. > Using `Task` in public API requires that a new Task instance be allocated > on every call. When returning synchronously, using ValueTask will allow the > method to not allocate. > In order to do this, we will need to take a new dependency on > {{System.Threading.Tasks.Extensions}} NuGet package. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5210) [Python] editable install (pip install -e .) is failing
[ https://issues.apache.org/jira/browse/ARROW-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825170#comment-16825170 ] Joris Van den Bossche commented on ARROW-5210: -- The reason it is currently failing is because we don't list numpy as a build requirement (not in {{setup_requires}} and not in {{pyproject.toml}}). This also seems to indicate the that current {{pyproject.toml}} is actually not tested (because building a wheel using an isolated environment based on the build dependencies specified in the file, should fail with missing numpy). Patch by [~pitrou] : {code:none} diff --git a/python/pyproject.toml b/python/pyproject.toml index 712647e4f..a6c51ec20 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -16,4 +16,4 @@ # under the License. [build-system] -requires = ["setuptools", "wheel", "setuptools_scm", "cython >= 0.29"] +requires = ["setuptools", "wheel", "setuptools_scm", "cython >= 0.29", "numpy >= 1.14"] diff --git a/python/setup.py b/python/setup.py index 907524a60..63014a80a 100755 --- a/python/setup.py +++ b/python/setup.py @@ -542,19 +542,20 @@ class BinaryDistribution(Distribution): return True +numpy_requires = 'numpy >= 1.14' + install_requires = ( - 'numpy >= 1.14', + numpy_requires, 'six >= 1.0.0', 'futures; python_version < "3.2"', 'enum34 >= 1.1.6; python_version < "3.4"', ) +setup_requires = ['setuptools_scm', 'cython >= 0.29', numpy_requires] # Only include pytest-runner in setup_requires if we're invoking tests if {'pytest', 'test', 'ptr'}.intersection(sys.argv): - setup_requires = ['pytest-runner'] -else: - setup_requires = [] + setup_requires.append('pytest-runner') setup( @@ -581,7 +582,7 @@ setup( 'write_to': os.path.join(scm_version_write_to_prefix, 'pyarrow/_generated_version.py') }, - setup_requires=['setuptools_scm', 'cython >= 0.29'] + setup_requires, + setup_requires=setup_requires, install_requires=install_requires, tests_require=['pytest', 'pandas', 'hypothesis', 'pathlib2; python_version < "3.4"'],{code} with that patch, one still needs {{pip install -e . --no-use-pep517}} (for the latest pip 19.1 release) to specify to pip that we _do_ want to do an editable install. But I would actually argue that even if the above is fixed, doing {{pip install -e . --no-use-pep517 --no-build-isolation}} is better, as when doing an editable install, you don't need to the build isolation feature of numpy, you just want to build pyarrow against your existing development environment. > [Python] editable install (pip install -e .) is failing > > > Key: ARROW-5210 > URL: https://issues.apache.org/jira/browse/ARROW-5210 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Minor > > Following the python development documentation on building arrow and pyarrow > ([https://arrow.apache.org/docs/developers/python.html#build-and-test),] > building pyarrow inplace with {{python setup.py build_ext --inplace}} works > fine. > > But if you want to also install this inplace version in the current python > environment (editable install / development install) using pip ({{pip install > -e .}}), this fails during the {{built_ext}} / cmake phase: > {code:none} > > -- Looking for python3.7m > -- Found Python lib > /home/joris/miniconda3/envs/arrow-dev/lib/libpython3.7m.so > CMake Error at cmake_modules/FindNumPy.cmake:62 (message): > NumPy import failure: > Traceback (most recent call last): > File "", line 1, in > ModuleNotFoundError: No module named 'numpy' > Call Stack (most recent call first): > CMakeLists.txt:186 (find_package) > -- Configuring incomplete, errors occurred! > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log". > error: command 'cmake' failed with exit status 1 > Cleaning up... > {code} > > Alternatively, doing {{python setup.py develop}} to achieve the same still > works. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4935) [C++] Errors from jemalloc when building pyarrow from source on OSX and Debian
[ https://issues.apache.org/jira/browse/ARROW-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825517#comment-16825517 ] Ian Mateus Vieira Manor commented on ARROW-4935: Solved the jemalloc problem on my machine by installing macOS SDK headers. {code:java} cd /Library/Developer/CommandLineTools/Packages/ open macOS_SDK_headers_for_macOS_10.14.pkg{code} > [C++] Errors from jemalloc when building pyarrow from source on OSX and Debian > -- > > Key: ARROW-4935 > URL: https://issues.apache.org/jira/browse/ARROW-4935 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.12.1 > Environment: OSX, Debian, Python==3.6.7 >Reporter: Gregory Hayes >Priority: Critical > Labels: build, newbie > > My attempts to build pyarrow from source are failing. I've set up the conda > environment using the instructions provided in the Develop instructions, and > have tried this on both Debian and OSX. When I run CMAKE in debug mode on > OSX, the output is: > {code:java} > -- Building using CMake version: 3.14.0 > -- Arrow version: 0.13.0 (full: '0.13.0-SNAPSHOT') > -- clang-tidy not found > -- clang-format not found > -- infer found at /usr/local/bin/infer > -- Using ccache: /usr/local/bin/ccache > -- Found cpplint executable at > /Users/Greg/documents/repos/arrow/cpp/build-support/cpplint.py > -- Compiler command: env LANG=C > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ > -v > -- Compiler version: Apple LLVM version 10.0.0 (clang-1000.11.45.5) > Target: x86_64-apple-darwin18.2.0 > Thread model: posix > InstalledDir: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin > -- Compiler id: AppleClang > Selected compiler clang 4.1.0svn > -- Arrow build warning level: CHECKIN > Configured for DEBUG build (set with cmake > -DCMAKE_BUILD_TYPE={release,debug,...}) > -- Build Type: DEBUG > -- BOOST_VERSION: 1.67.0 > -- BROTLI_VERSION: v0.6.0 > -- CARES_VERSION: 1.15.0 > -- DOUBLE_CONVERSION_VERSION: v3.1.1 > -- FLATBUFFERS_VERSION: v1.10.0 > -- GBENCHMARK_VERSION: v1.4.1 > -- GFLAGS_VERSION: v2.2.0 > -- GLOG_VERSION: v0.3.5 > -- GRPC_VERSION: v1.18.0 > -- GTEST_VERSION: 1.8.1 > -- JEMALLOC_VERSION: 17c897976c60b0e6e4f4a365c751027244dada7a > -- LZ4_VERSION: v1.8.3 > -- ORC_VERSION: 1.5.4 > -- PROTOBUF_VERSION: v3.6.1 > -- RAPIDJSON_VERSION: v1.1.0 > -- RE2_VERSION: 2018-10-01 > -- SNAPPY_VERSION: 1.1.3 > -- THRIFT_VERSION: 0.11.0 > -- ZLIB_VERSION: 1.2.8 > -- ZSTD_VERSION: v1.3.7 > -- Boost version: 1.68.0 > -- Found the following Boost libraries: > -- regex > -- system > -- filesystem > -- Boost include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- Boost libraries: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_regex.dylib/Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_system.dylib/Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_filesystem.dylib > Added shared library dependency boost_system_shared: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_system.dylib > Added shared library dependency boost_filesystem_shared: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_filesystem.dylib > Added shared library dependency boost_regex_shared: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_regex.dylib > Added static library dependency double-conversion_static: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libdouble-conversion.a > -- double-conversion include dir: > /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- double-conversion static library: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libdouble-conversion.a > -- GFLAGS_HOME: /Users/Greg/anaconda3/envs/pyarrow-dev > -- GFlags include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- GFlags static library: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libgflags.a > Added static library dependency gflags_static: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libgflags.a > -- RapidJSON include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- Found the Flatbuffers library: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libflatbuffers.a > -- Flatbuffers include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- Flatbuffers compiler: /Users/Greg/anaconda3/envs/pyarrow-dev/bin/flatc > Added static library dependency jemalloc_static: > /Users/Greg/documents/repos/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a > Added shared library dependency jemalloc_shared: > /Users/Greg/documents/repos/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc.dylib > -- Found hdfs.h at: > /Users/Greg/documents/repos/arrow/cpp/thirdparty/hadoop/include/hdfs.h > -- Found the ZLIB shared library: >
[jira] [Commented] (ARROW-5130) [Python] Segfault when importing TensorFlow after Pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825554#comment-16825554 ] Alexander Sergeev commented on ARROW-5130: -- One workaround we found is to LD_PRELOAD /usr/lib/x86_64-linux-gnu/libstdc++.so.6. Wes, is there a reason PyArrow re-exports a bunch of C++ std library symbols? > [Python] Segfault when importing TensorFlow after Pyarrow > - > > Key: ARROW-5130 > URL: https://issues.apache.org/jira/browse/ARROW-5130 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Travis Addair >Priority: Major > > This issue is similar to https://jira.apache.org/jira/browse/ARROW-2657 which > was fixed in v0.10.0. > When we import TensorFlow after Pyarrow in Linux Debian Jessie, we get a > segfault. To reproduce: > {code:java} > import pyarrow > import tensorflow{code} > Here's the backtrace from gdb: > {code:java} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x in ?? () > (gdb) bt > #0 0x in ?? () > #1 0x7f529ee04410 in pthread_once () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103 > #2 0x7f5229a74efa in void std::call_once(std::once_flag&, > void (&)()) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #3 0x7f5229a74f3e in > tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #4 0x7f522978b561 in tensorflow::port::(anonymous > namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string > const&) () > from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #5 0x7f522978b5b4 in _GLOBAL__sub_I_cpu_feature_guard.cc () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #6 0x7f529f224bea in call_init (l=, argc=argc@entry=9, > argv=argv@entry=0x7ffc6d8c1488, env=env@entry=0x294c0c0) at dl-init.c:78 > #7 0x7f529f224cd3 in call_init (env=0x294c0c0, argv=0x7ffc6d8c1488, > argc=9, l=) at dl-init.c:36 > #8 _dl_init (main_map=main_map@entry=0x2e4aff0, argc=9, argv=0x7ffc6d8c1488, > env=0x294c0c0) at dl-init.c:126 > #9 0x7f529f228e38 in dl_open_worker (a=a@entry=0x7ffc6d8bebb8) at > dl-open.c:577 > #10 0x7f529f224aa4 in _dl_catch_error > (objname=objname@entry=0x7ffc6d8beba8, > errstring=errstring@entry=0x7ffc6d8bebb0, > mallocedp=mallocedp@entry=0x7ffc6d8beba7, > operate=operate@entry=0x7f529f228b60 , > args=args@entry=0x7ffc6d8bebb8) at dl-error.c:187 > #11 0x7f529f22862b in _dl_open (file=0x7f5248178b54 > "/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so", > mode=-2147483646, caller_dlopen=, > nsid=-2, argc=9, argv=0x7ffc6d8c1488, env=0x294c0c0) at dl-open.c:661 > #12 0x7f529ebf402b in dlopen_doit (a=a@entry=0x7ffc6d8bedd0) at > dlopen.c:66 > #13 0x7f529f224aa4 in _dl_catch_error (objname=0x2950fc0, > errstring=0x2950fc8, mallocedp=0x2950fb8, operate=0x7f529ebf3fd0 > , args=0x7ffc6d8bedd0) at dl-error.c:187 > #14 0x7f529ebf45dd in _dlerror_run (operate=operate@entry=0x7f529ebf3fd0 > , args=args@entry=0x7ffc6d8bedd0) at dlerror.c:163 > #15 0x7f529ebf40c1 in __dlopen (file=, mode= out>) at dlopen.c:87 > #16 0x00540859 in _PyImport_GetDynLoadFunc () > #17 0x0054024c in _PyImport_LoadDynamicModule () > #18 0x005f2bcb in ?? () > #19 0x004ca235 in PyEval_EvalFrameEx () > #20 0x004ca9c2 in PyEval_EvalFrameEx () > #21 0x004c8c39 in PyEval_EvalCodeEx () > #22 0x004c84e6 in PyEval_EvalCode () > #23 0x004c6e5c in PyImport_ExecCodeModuleEx () > #24 0x004c3272 in ?? () > #25 0x004b19e2 in ?? () > #26 0x004b13d7 in ?? () > #27 0x004b42f6 in ?? () > #28 0x004d1aab in PyEval_CallObjectWithKeywords () > #29 0x004ccdb3 in PyEval_EvalFrameEx () > #30 0x004c8c39 in PyEval_EvalCodeEx () > #31 0x004c84e6 in PyEval_EvalCode () > #32 0x004c6e5c in PyImport_ExecCodeModuleEx () > #33 0x004c3272 in ?? () > #34 0x004b1d3f in ?? () > #35 0x004b6b2b in ?? () > #36 0x004b0d82 in ?? () > #37 0x004b42f6 in ?? () > #38 0x004d1aab in PyEval_CallObjectWithKeywords () > #39 0x004ccdb3 in PyEval_EvalFrameEx (){code} > It looks like the code changes that fixed the previous issue was recently > removed in > [https://github.com/apache/arrow/commit/b766bff34b7d85034d26cebef5b3aeef1eb2fd82#diff-16806bcebc1df2fae432db426905b9f0]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4139) [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is set
[ https://issues.apache.org/jira/browse/ARROW-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4139: -- Labels: parquet pull-request-available python (was: parquet python) > [Python] Cast Parquet column statistics to unicode if UTF8 ConvertedType is > set > --- > > Key: ARROW-4139 > URL: https://issues.apache.org/jira/browse/ARROW-4139 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Matthew Rocklin >Priority: Minor > Labels: parquet, pull-request-available, python > Fix For: 0.14.0 > > > When writing Pandas data to Parquet format and reading it back again I find > that that statistics of text columns are stored as byte arrays rather than as > unicode text. > I'm not sure if this is a bug in Arrow, PyArrow, or just in my understanding > of how best to manage statistics. (I'd be quite happy to learn that it was > the latter). > Here is a minimal example > {code:python} > import pandas as pd > df = pd.DataFrame({'x': ['a']}) > df.to_parquet('df.parquet') > import pyarrow.parquet as pq > pf = pq.ParquetDataset('df.parquet') > piece = pf.pieces[0] > rg = piece.row_group(0) > md = piece.get_metadata(pq.ParquetFile) > rg = md.row_group(0) > c = rg.column(0) > >>> c > > file_offset: 63 > file_path: > physical_type: BYTE_ARRAY > num_values: 1 > path_in_schema: x > is_stats_set: True > statistics: > > has_min_max: True > min: b'a' > max: b'a' > null_count: 0 > distinct_count: 0 > num_values: 1 > physical_type: BYTE_ARRAY > compression: SNAPPY > encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE') > has_dictionary_page: True > dictionary_page_offset: 4 > data_page_offset: 25 > total_compressed_size: 59 > total_uncompressed_size: 55 > >>> type(c.statistics.min) > bytes > {code} > My guess is that we would want to store a logical type in the statistics like > UNICODE, though I don't have enough experience with Parquet data types to > know if this is a good idea or possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5130) [Python] Segfault when importing TensorFlow after Pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825562#comment-16825562 ] Wes McKinney commented on ARROW-5130: - We aren't doing so on purpose > [Python] Segfault when importing TensorFlow after Pyarrow > - > > Key: ARROW-5130 > URL: https://issues.apache.org/jira/browse/ARROW-5130 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Travis Addair >Priority: Major > > This issue is similar to https://jira.apache.org/jira/browse/ARROW-2657 which > was fixed in v0.10.0. > When we import TensorFlow after Pyarrow in Linux Debian Jessie, we get a > segfault. To reproduce: > {code:java} > import pyarrow > import tensorflow{code} > Here's the backtrace from gdb: > {code:java} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x in ?? () > (gdb) bt > #0 0x in ?? () > #1 0x7f529ee04410 in pthread_once () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103 > #2 0x7f5229a74efa in void std::call_once(std::once_flag&, > void (&)()) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #3 0x7f5229a74f3e in > tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #4 0x7f522978b561 in tensorflow::port::(anonymous > namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string > const&) () > from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #5 0x7f522978b5b4 in _GLOBAL__sub_I_cpu_feature_guard.cc () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #6 0x7f529f224bea in call_init (l=, argc=argc@entry=9, > argv=argv@entry=0x7ffc6d8c1488, env=env@entry=0x294c0c0) at dl-init.c:78 > #7 0x7f529f224cd3 in call_init (env=0x294c0c0, argv=0x7ffc6d8c1488, > argc=9, l=) at dl-init.c:36 > #8 _dl_init (main_map=main_map@entry=0x2e4aff0, argc=9, argv=0x7ffc6d8c1488, > env=0x294c0c0) at dl-init.c:126 > #9 0x7f529f228e38 in dl_open_worker (a=a@entry=0x7ffc6d8bebb8) at > dl-open.c:577 > #10 0x7f529f224aa4 in _dl_catch_error > (objname=objname@entry=0x7ffc6d8beba8, > errstring=errstring@entry=0x7ffc6d8bebb0, > mallocedp=mallocedp@entry=0x7ffc6d8beba7, > operate=operate@entry=0x7f529f228b60 , > args=args@entry=0x7ffc6d8bebb8) at dl-error.c:187 > #11 0x7f529f22862b in _dl_open (file=0x7f5248178b54 > "/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so", > mode=-2147483646, caller_dlopen=, > nsid=-2, argc=9, argv=0x7ffc6d8c1488, env=0x294c0c0) at dl-open.c:661 > #12 0x7f529ebf402b in dlopen_doit (a=a@entry=0x7ffc6d8bedd0) at > dlopen.c:66 > #13 0x7f529f224aa4 in _dl_catch_error (objname=0x2950fc0, > errstring=0x2950fc8, mallocedp=0x2950fb8, operate=0x7f529ebf3fd0 > , args=0x7ffc6d8bedd0) at dl-error.c:187 > #14 0x7f529ebf45dd in _dlerror_run (operate=operate@entry=0x7f529ebf3fd0 > , args=args@entry=0x7ffc6d8bedd0) at dlerror.c:163 > #15 0x7f529ebf40c1 in __dlopen (file=, mode= out>) at dlopen.c:87 > #16 0x00540859 in _PyImport_GetDynLoadFunc () > #17 0x0054024c in _PyImport_LoadDynamicModule () > #18 0x005f2bcb in ?? () > #19 0x004ca235 in PyEval_EvalFrameEx () > #20 0x004ca9c2 in PyEval_EvalFrameEx () > #21 0x004c8c39 in PyEval_EvalCodeEx () > #22 0x004c84e6 in PyEval_EvalCode () > #23 0x004c6e5c in PyImport_ExecCodeModuleEx () > #24 0x004c3272 in ?? () > #25 0x004b19e2 in ?? () > #26 0x004b13d7 in ?? () > #27 0x004b42f6 in ?? () > #28 0x004d1aab in PyEval_CallObjectWithKeywords () > #29 0x004ccdb3 in PyEval_EvalFrameEx () > #30 0x004c8c39 in PyEval_EvalCodeEx () > #31 0x004c84e6 in PyEval_EvalCode () > #32 0x004c6e5c in PyImport_ExecCodeModuleEx () > #33 0x004c3272 in ?? () > #34 0x004b1d3f in ?? () > #35 0x004b6b2b in ?? () > #36 0x004b0d82 in ?? () > #37 0x004b42f6 in ?? () > #38 0x004d1aab in PyEval_CallObjectWithKeywords () > #39 0x004ccdb3 in PyEval_EvalFrameEx (){code} > It looks like the code changes that fixed the previous issue was recently > removed in > [https://github.com/apache/arrow/commit/b766bff34b7d85034d26cebef5b3aeef1eb2fd82#diff-16806bcebc1df2fae432db426905b9f0]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5186) [Plasma] Crash on deleting CUDA memory
[ https://issues.apache.org/jira/browse/ARROW-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-5186: --- Assignee: shengjun.li > [Plasma] Crash on deleting CUDA memory > -- > > Key: ARROW-5186 > URL: https://issues.apache.org/jira/browse/ARROW-5186 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: shengjun.li >Assignee: shengjun.li >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > cpp/CMakeLists.txt > option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" > ON) > option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON) > [sample sequence] > (1) call PlasmaClient::Create(id_object, data_size, 0, 0, , 1) // where > device_num != 0 > (2) call PlasmaClient::Seal(id_object) > (3) call PlasmaClient::Release(id_object) > (4) call PlasmaClient::Delete(id_object) // server carsh! > *** Aborted at 1555645923 (unix time) try "date -d @1555645923" if you are > using GNU date *** > PC: @ 0x7f65bcfa1428 gsignal > *** SIGABRT (@0x3e86d67) received by PID 28007 (TID 0x7f65bf225740) from > PID 28007; stack trace: *** > @ 0x7f65bd347390 (unknown) > @ 0x7f65bcfa1428 gsignal > @ 0x7f65bcfa302a abort > @ 0x4a56cd dlfree > @ 0x4b4bc2 plasma::PlasmaAllocator::Free() > @ 0x4b7da3 plasma::PlasmaStore::EraseFromObjectTable() > @ 0x4b87d2 plasma::PlasmaStore::DeleteObject() > @ 0x4bb3d2 plasma::PlasmaStore::ProcessMessage() > @ 0x4b9195 _ZZN6plasma11PlasmaStore13ConnectClientEiENKUliE_clEi > @ 0x4bd752 > _ZNSt17_Function_handlerIFviEZN6plasma11PlasmaStore13ConnectClientEiEUliE_E9_M_invokeERKSt9_Any_dataOi > @ 0x4ab998 std::function<>::operator()() > @ 0x4aaea7 plasma::EventLoop::FileEventCallback() > @ 0x4dbd8f aeProcessEvents > @ 0x4dbf50 aeMain > @ 0x4ab19b plasma::EventLoop::Start() > @ 0x4bfc93 plasma::PlasmaStoreRunner::Start() > @ 0x4bc34d plasma::StartServer() > @ 0x4bcfbd main > @ 0x7f65bcf8c830 __libc_start_main > @ 0x49e939 _start > @ 0x0 (unknown) > Aborted (core dumped) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5186) [Plasma] Crash on deleting CUDA memory
[ https://issues.apache.org/jira/browse/ARROW-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5186: Summary: [Plasma] Crash on deleting CUDA memory (was: [plasma] carsh on delete gpu memory) > [Plasma] Crash on deleting CUDA memory > -- > > Key: ARROW-5186 > URL: https://issues.apache.org/jira/browse/ARROW-5186 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: shengjun.li >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > cpp/CMakeLists.txt > option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" > ON) > option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON) > [sample sequence] > (1) call PlasmaClient::Create(id_object, data_size, 0, 0, , 1) // where > device_num != 0 > (2) call PlasmaClient::Seal(id_object) > (3) call PlasmaClient::Release(id_object) > (4) call PlasmaClient::Delete(id_object) // server carsh! > *** Aborted at 1555645923 (unix time) try "date -d @1555645923" if you are > using GNU date *** > PC: @ 0x7f65bcfa1428 gsignal > *** SIGABRT (@0x3e86d67) received by PID 28007 (TID 0x7f65bf225740) from > PID 28007; stack trace: *** > @ 0x7f65bd347390 (unknown) > @ 0x7f65bcfa1428 gsignal > @ 0x7f65bcfa302a abort > @ 0x4a56cd dlfree > @ 0x4b4bc2 plasma::PlasmaAllocator::Free() > @ 0x4b7da3 plasma::PlasmaStore::EraseFromObjectTable() > @ 0x4b87d2 plasma::PlasmaStore::DeleteObject() > @ 0x4bb3d2 plasma::PlasmaStore::ProcessMessage() > @ 0x4b9195 _ZZN6plasma11PlasmaStore13ConnectClientEiENKUliE_clEi > @ 0x4bd752 > _ZNSt17_Function_handlerIFviEZN6plasma11PlasmaStore13ConnectClientEiEUliE_E9_M_invokeERKSt9_Any_dataOi > @ 0x4ab998 std::function<>::operator()() > @ 0x4aaea7 plasma::EventLoop::FileEventCallback() > @ 0x4dbd8f aeProcessEvents > @ 0x4dbf50 aeMain > @ 0x4ab19b plasma::EventLoop::Start() > @ 0x4bfc93 plasma::PlasmaStoreRunner::Start() > @ 0x4bc34d plasma::StartServer() > @ 0x4bcfbd main > @ 0x7f65bcf8c830 __libc_start_main > @ 0x49e939 _start > @ 0x0 (unknown) > Aborted (core dumped) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3873) [C++] Build shared libraries consistently with -fvisibility=hidden
[ https://issues.apache.org/jira/browse/ARROW-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825571#comment-16825571 ] Wes McKinney commented on ARROW-3873: - I just closed https://github.com/apache/arrow/pull/2437 and will plan to return to this once the Parquet symbol visibility issue is dealt with > [C++] Build shared libraries consistently with -fvisibility=hidden > -- > > Key: ARROW-3873 > URL: https://issues.apache.org/jira/browse/ARROW-3873 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > See https://github.com/apache/arrow/pull/2437 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5130) [Python] Segfault when importing TensorFlow after Pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825601#comment-16825601 ] Alexander Sergeev commented on ARROW-5130: -- Wes, would you take a PR that cleans these things up? {code:java} # for f in $(ls -1 /usr/local/lib/python2.7/dist-packages/pyarrow/*.so*); do echo $f; nm -D $f | c++filt | grep std::_Hash_bytes; done /usr/local/lib/python2.7/dist-packages/pyarrow/_csv.so U std::_Hash_bytes(void const*, unsigned long, unsigned long) /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_boost_filesystem.so /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_boost_filesystem.so.1.66.0 /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_boost_regex.so /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_boost_regex.so.1.66.0 /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_boost_system.so /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_boost_system.so.1.66.0 /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_python.so 000e2250 T std::_Hash_bytes(void const*, unsigned long, unsigned long) /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow_python.so.13 000e2250 T std::_Hash_bytes(void const*, unsigned long, unsigned long) /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow.so /usr/local/lib/python2.7/dist-packages/pyarrow/libarrow.so.13 /usr/local/lib/python2.7/dist-packages/pyarrow/libparquet.so 001ce380 T std::_Hash_bytes(void const*, unsigned long, unsigned long) /usr/local/lib/python2.7/dist-packages/pyarrow/libparquet.so.13 001ce380 T std::_Hash_bytes(void const*, unsigned long, unsigned long) /usr/local/lib/python2.7/dist-packages/pyarrow/libplasma.so /usr/local/lib/python2.7/dist-packages/pyarrow/libplasma.so.13 /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so U std::_Hash_bytes(void const*, unsigned long, unsigned long) /usr/local/lib/python2.7/dist-packages/pyarrow/libz-7f57503f.so.1.2.11 /usr/local/lib/python2.7/dist-packages/pyarrow/_orc.so /usr/local/lib/python2.7/dist-packages/pyarrow/_parquet.so U std::_Hash_bytes(void const*, unsigned long, unsigned long) /usr/local/lib/python2.7/dist-packages/pyarrow/_plasma.so {code} > [Python] Segfault when importing TensorFlow after Pyarrow > - > > Key: ARROW-5130 > URL: https://issues.apache.org/jira/browse/ARROW-5130 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Travis Addair >Priority: Major > > This issue is similar to https://jira.apache.org/jira/browse/ARROW-2657 which > was fixed in v0.10.0. > When we import TensorFlow after Pyarrow in Linux Debian Jessie, we get a > segfault. To reproduce: > {code:java} > import pyarrow > import tensorflow{code} > Here's the backtrace from gdb: > {code:java} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x in ?? () > (gdb) bt > #0 0x in ?? () > #1 0x7f529ee04410 in pthread_once () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103 > #2 0x7f5229a74efa in void std::call_once(std::once_flag&, > void (&)()) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #3 0x7f5229a74f3e in > tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #4 0x7f522978b561 in tensorflow::port::(anonymous > namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string > const&) () > from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #5 0x7f522978b5b4 in _GLOBAL__sub_I_cpu_feature_guard.cc () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #6 0x7f529f224bea in call_init (l=, argc=argc@entry=9, > argv=argv@entry=0x7ffc6d8c1488, env=env@entry=0x294c0c0) at dl-init.c:78 > #7 0x7f529f224cd3 in call_init (env=0x294c0c0, argv=0x7ffc6d8c1488, > argc=9, l=) at dl-init.c:36 > #8 _dl_init (main_map=main_map@entry=0x2e4aff0, argc=9, argv=0x7ffc6d8c1488, > env=0x294c0c0) at dl-init.c:126 > #9 0x7f529f228e38 in dl_open_worker (a=a@entry=0x7ffc6d8bebb8) at > dl-open.c:577 > #10 0x7f529f224aa4 in _dl_catch_error > (objname=objname@entry=0x7ffc6d8beba8, > errstring=errstring@entry=0x7ffc6d8bebb0, > mallocedp=mallocedp@entry=0x7ffc6d8beba7, > operate=operate@entry=0x7f529f228b60 , > args=args@entry=0x7ffc6d8bebb8) at dl-error.c:187 > #11 0x7f529f22862b in _dl_open (file=0x7f5248178b54 > "/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so", > mode=-2147483646, caller_dlopen=, > nsid=-2, argc=9, argv=0x7ffc6d8c1488, env=0x294c0c0) at
[jira] [Comment Edited] (ARROW-5210) [Python] editable install (pip install -e .) is failing
[ https://issues.apache.org/jira/browse/ARROW-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825155#comment-16825155 ] Joris Van den Bossche edited comment on ARROW-5210 at 4/24/19 1:42 PM: --- With pip 19.1 (released yesterday), one needs to do {{pip install -e . --no-use-pep517 --no-build-isolation}} to get it running with our current set-up. was (Author: jorisvandenbossche): With pip 19.1 (released yesterday), one needs to do pip install -e . --no-use-pep517 --no-build-isolation to get it running with our current set-up. > [Python] editable install (pip install -e .) is failing > > > Key: ARROW-5210 > URL: https://issues.apache.org/jira/browse/ARROW-5210 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Minor > > Following the python development documentation on building arrow and pyarrow > ([https://arrow.apache.org/docs/developers/python.html#build-and-test),] > building pyarrow inplace with {{python setup.py build_ext --inplace}} works > fine. > > But if you want to also install this inplace version in the current python > environment (editable install / development install) using pip ({{pip install > -e .}}), this fails during the {{built_ext}} / cmake phase: > {code:none} > > -- Looking for python3.7m > -- Found Python lib > /home/joris/miniconda3/envs/arrow-dev/lib/libpython3.7m.so > CMake Error at cmake_modules/FindNumPy.cmake:62 (message): > NumPy import failure: > Traceback (most recent call last): > File "", line 1, in > ModuleNotFoundError: No module named 'numpy' > Call Stack (most recent call first): > CMakeLists.txt:186 (find_package) > -- Configuring incomplete, errors occurred! > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log". > error: command 'cmake' failed with exit status 1 > Cleaning up... > {code} > > Alternatively, doing {{python setup.py develop}} to achieve the same still > works. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5210) [Python] editable install (pip install -e .) is failing
[ https://issues.apache.org/jira/browse/ARROW-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825155#comment-16825155 ] Joris Van den Bossche commented on ARROW-5210: -- With pip 19.1 (released yesterday), one needs to do pip install -e . --no-use-pep517 --no-build-isolation to get it running with our current set-up. > [Python] editable install (pip install -e .) is failing > > > Key: ARROW-5210 > URL: https://issues.apache.org/jira/browse/ARROW-5210 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Minor > > Following the python development documentation on building arrow and pyarrow > ([https://arrow.apache.org/docs/developers/python.html#build-and-test),] > building pyarrow inplace with {{python setup.py build_ext --inplace}} works > fine. > > But if you want to also install this inplace version in the current python > environment (editable install / development install) using pip ({{pip install > -e .}}), this fails during the {{built_ext}} / cmake phase: > {code:none} > > -- Looking for python3.7m > -- Found Python lib > /home/joris/miniconda3/envs/arrow-dev/lib/libpython3.7m.so > CMake Error at cmake_modules/FindNumPy.cmake:62 (message): > NumPy import failure: > Traceback (most recent call last): > File "", line 1, in > ModuleNotFoundError: No module named 'numpy' > Call Stack (most recent call first): > CMakeLists.txt:186 (find_package) > -- Configuring incomplete, errors occurred! > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log". > error: command 'cmake' failed with exit status 1 > Cleaning up... > {code} > > Alternatively, doing {{python setup.py develop}} to achieve the same still > works. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-3176) [Python] Overflow in Date32 column conversion to pandas
[ https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824911#comment-16824911 ] Joris Van den Bossche edited comment on ARROW-3176 at 4/24/19 2:02 PM: --- Note that the default type changed: it now gives back datetime.date objects, instead of datetime64[D] (https://issues.apache.org/jira/browse/ARROW-3910). So by default you no longer have this problem. But, setting {{date_as_object=False}} (to have back the old behaviour), you still have the same overflow issue. Updated the original bug report to add this keyword, to keep it a reproducible example. was (Author: jorisvandenbossche): Note that the default type changed: it now gives back datetime.date objects, instead of datetime64[D]. Do by default you no longer have this problem. But, setting {{date_as_object=False}} (to have back the old behaviour), you still have the same overflow issue. Updated the original bug report to add this keyword, to keep it a reproducible example. > [Python] Overflow in Date32 column conversion to pandas > --- > > Key: ARROW-3176 > URL: https://issues.apache.org/jira/browse/ARROW-3176 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0 >Reporter: Florian Jetter >Priority: Minor > Fix For: 0.14.0 > > > When converting an arrow column holding a {{Date32Array}} to {{pandas}} there > seems to be an overflow at the date {{2262-04-12}} such that the type and > value are wrong. The issue only occurs for columns, not for arrays. > Running on debian 9.5 w/ python2 gives > > {code} > In [1]: import numpy as np > In [2]: import datetime > In [3]: import pyarrow as pa > In [4]: pa.__version__ > Out[4]: '0.10.0' > In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], > dtype='datetime64[D]')) > In [6]: arr.to_pandas(date_as_object=False) > Out[6]: array(['2262-04-12'], dtype='datetime64[D]') > In [7]: pa.column('name', arr).to_pandas(date_as_object=False) > Out[7]: > 0 1677-09-21 00:25:26.290448384 > Name: name, dtype: datetime64[ns] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2835) [C++] ReadAt/WriteAt are inconsistent with moving the files position
[ https://issues.apache.org/jira/browse/ARROW-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825258#comment-16825258 ] Antoine Pitrou commented on ARROW-2835: --- I see two other ways around this: 1) As soon as ReadAt or WriteAt is called, change the internal file state so that any implicitly-positioning operation (such as Read, Write or Tell) fails until Seek is called first. or 2) Have an internal "positioning" lock that ensures that we can have several ReadAt or WriteAt calls simultaneously, but that implicitly positioning operations wait for the last *At call to end and to restore the file pointer. I'm not sure how easy #2 is, but should be doable. > [C++] ReadAt/WriteAt are inconsistent with moving the files position > > > Key: ARROW-2835 > URL: https://issues.apache.org/jira/browse/ARROW-2835 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Dimitri Vorona >Priority: Major > Fix For: 0.14.0 > > > Right now, there is inconsistent behaviour regarding moving the files > position pointer after calling ReadAt or WriteAt. For example, the default > implementation of ReadAt seeks to the desired offset and calls Read which > moves the position pointer. MemoryMappedFile::ReadAt, however, doesn't change > the position. WriteableFile::WriteAt seem to move the position in the current > implementation, but there is no docstring which prescribes this behaviour. > Antoine suggested that *At methods shouldn't touch the position and it makes > more sense, IMHO. The change isn't huge and doesn't seem to break anything > internally, but it might break the existing user code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5208) [Python] Inconsistent resulting type during casting in pa.array() when mask is present
[ https://issues.apache.org/jira/browse/ARROW-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825402#comment-16825402 ] Wes McKinney commented on ARROW-5208: - Seems reasonable. Would you like to submit a pull request? > [Python] Inconsistent resulting type during casting in pa.array() when mask > is present > -- > > Key: ARROW-5208 > URL: https://issues.apache.org/jira/browse/ARROW-5208 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 >Reporter: Artem KOZHEVNIKOV >Priority: Major > Fix For: 0.14.0 > > > I would expect Int64Array type in all cases below : > {code:java} > >>> pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) > >>> > >>> [ 4, null, > >>> 4, null ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5208) [Python] Inconsistent resulting type during casting in pa.array() when mask is present
[ https://issues.apache.org/jira/browse/ARROW-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5208: Fix Version/s: 0.14.0 > [Python] Inconsistent resulting type during casting in pa.array() when mask > is present > -- > > Key: ARROW-5208 > URL: https://issues.apache.org/jira/browse/ARROW-5208 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 >Reporter: Artem KOZHEVNIKOV >Priority: Major > Fix For: 0.14.0 > > > I would expect Int64Array type in all cases below : > {code:java} > >>> pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) > >>> > >>> [ 4, null, > >>> 4, null ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3176) [Python] Overflow in Date32 column conversion to pandas
[ https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825407#comment-16825407 ] Joris Van den Bossche commented on ARROW-3176: -- Yes, I think, ideally, arrow should be responsible of checking that the values fit in the range supported by pandas. From the two remaining options, I agree raising is probably the best option. > [Python] Overflow in Date32 column conversion to pandas > --- > > Key: ARROW-3176 > URL: https://issues.apache.org/jira/browse/ARROW-3176 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0 >Reporter: Florian Jetter >Priority: Minor > Fix For: 0.14.0 > > > When converting an arrow column holding a {{Date32Array}} to {{pandas}} there > seems to be an overflow at the date {{2262-04-12}} such that the type and > value are wrong. The issue only occurs for columns, not for arrays. > Running on debian 9.5 w/ python2 gives > > {code} > In [1]: import numpy as np > In [2]: import datetime > In [3]: import pyarrow as pa > In [4]: pa.__version__ > Out[4]: '0.10.0' > In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], > dtype='datetime64[D]')) > In [6]: arr.to_pandas(date_as_object=False) > Out[6]: array(['2262-04-12'], dtype='datetime64[D]') > In [7]: pa.column('name', arr).to_pandas(date_as_object=False) > Out[7]: > 0 1677-09-21 00:25:26.290448384 > Name: name, dtype: datetime64[ns] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3176) [Python] Overflow in Date32 column conversion to pandas
[ https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825426#comment-16825426 ] Joris Van den Bossche commented on ARROW-3176: -- This seems to be a pandas regression: https://github.com/pandas-dev/pandas/issues/26206 > [Python] Overflow in Date32 column conversion to pandas > --- > > Key: ARROW-3176 > URL: https://issues.apache.org/jira/browse/ARROW-3176 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0 >Reporter: Florian Jetter >Priority: Minor > Fix For: 0.14.0 > > > When converting an arrow column holding a {{Date32Array}} to {{pandas}} there > seems to be an overflow at the date {{2262-04-12}} such that the type and > value are wrong. The issue only occurs for columns, not for arrays. > Running on debian 9.5 w/ python2 gives > > {code} > In [1]: import numpy as np > In [2]: import datetime > In [3]: import pyarrow as pa > In [4]: pa.__version__ > Out[4]: '0.10.0' > In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], > dtype='datetime64[D]')) > In [6]: arr.to_pandas(date_as_object=False) > Out[6]: array(['2262-04-12'], dtype='datetime64[D]') > In [7]: pa.column('name', arr).to_pandas(date_as_object=False) > Out[7]: > 0 1677-09-21 00:25:26.290448384 > Name: name, dtype: datetime64[ns] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5211) Missing documentation under `Dictionary encoding` section on MetaData page
Lennox Stevenson created ARROW-5211: --- Summary: Missing documentation under `Dictionary encoding` section on MetaData page Key: ARROW-5211 URL: https://issues.apache.org/jira/browse/ARROW-5211 Project: Apache Arrow Issue Type: Improvement Reporter: Lennox Stevenson First time throwing up an issue here so let me know if there's anything I missed / more details I can provide. Just going through the arrow documentation at [https://arrow.apache.org/docs/python/] and I noticed that there's a section that is currently blank. From what I can tell the section [https://arrow.apache.org/docs/format/Metadata.html#dictionary-encoding] currently contains nothing in it. Is that intended? It was confusing to see a blank section, but that is just my opinion so it may not be worth changing. If this is something work fixing / improving, then it's probably worth either filling out that section or simply removing header to avoid future confusion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5212) Array BinaryBuilder in Go library has no access to resize the values buffer
Jonathan A Sternberg created ARROW-5212: --- Summary: Array BinaryBuilder in Go library has no access to resize the values buffer Key: ARROW-5212 URL: https://issues.apache.org/jira/browse/ARROW-5212 Project: Apache Arrow Issue Type: Improvement Reporter: Jonathan A Sternberg When you are dealing with a binary builder, there are three buffers: the null bitmap, the offset indexes, and the values buffer which contains the actual data. When {{Reserve}} or {{Resize}} are used, the null bitmap and the offsets are modified to allow for additional appends to function. This seems correct to me. There's no way to know how much the values buffer should be resized until the values are being appended with just the number of values alone. But, when you are then appending a bunch of string values, there's no additional API to preallocate the size of that last buffer. That means that batch appending a large amount of strings will constantly allocate even if you know the size ahead of time. There should be some additional API to modify this last buffer such as maybe {{ReserveBytes}} and {{ResizeBytes}} that would correspond with the {{Reserve}} and {{Resize}} methods, but would related to the values buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3176) [Python] Overflow in Date32 column conversion to pandas
[ https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825405#comment-16825405 ] Wes McKinney commented on ARROW-3176: - This is a limitation with pandas's {{datetime64[ns]}}. One could argue for overflow checking on the to_pandas code path. There are three options * Current behavior (not that big of a deal now since we return {{datetime.date}} by default now) * Raise on overflow * Return NULL on overflow None of these options are great but maybe option 2 is the best? > [Python] Overflow in Date32 column conversion to pandas > --- > > Key: ARROW-3176 > URL: https://issues.apache.org/jira/browse/ARROW-3176 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0 >Reporter: Florian Jetter >Priority: Minor > Fix For: 0.14.0 > > > When converting an arrow column holding a {{Date32Array}} to {{pandas}} there > seems to be an overflow at the date {{2262-04-12}} such that the type and > value are wrong. The issue only occurs for columns, not for arrays. > Running on debian 9.5 w/ python2 gives > > {code} > In [1]: import numpy as np > In [2]: import datetime > In [3]: import pyarrow as pa > In [4]: pa.__version__ > Out[4]: '0.10.0' > In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], > dtype='datetime64[D]')) > In [6]: arr.to_pandas(date_as_object=False) > Out[6]: array(['2262-04-12'], dtype='datetime64[D]') > In [7]: pa.column('name', arr).to_pandas(date_as_object=False) > Out[7]: > 0 1677-09-21 00:25:26.290448384 > Name: name, dtype: datetime64[ns] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3176) [Python] Overflow in Date32 column conversion to pandas
[ https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825413#comment-16825413 ] Joris Van den Bossche commented on ARROW-3176: -- Actually, I take that back. It seems that it is pandas that is not doing a proper check (assuming that arrow passes datetime64[D] data, similarly as what the Array.to_pandas returns), and it is pandas that converts the datetime64[D] to incorrect datetime64[ns]: {code} In [22]: pd.Series(np.array(['2262-04-12'], dtype='datetime64[D]')) Out[22]: 0 1677-09-21 00:25:26.290448384 dtype: datetime64[ns]{code} Of course, you still get the "wrong" behaviour when using arrow's {{to_pandas}}, but I might consider this a bug on the pandas side. > [Python] Overflow in Date32 column conversion to pandas > --- > > Key: ARROW-3176 > URL: https://issues.apache.org/jira/browse/ARROW-3176 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0 >Reporter: Florian Jetter >Priority: Minor > Fix For: 0.14.0 > > > When converting an arrow column holding a {{Date32Array}} to {{pandas}} there > seems to be an overflow at the date {{2262-04-12}} such that the type and > value are wrong. The issue only occurs for columns, not for arrays. > Running on debian 9.5 w/ python2 gives > > {code} > In [1]: import numpy as np > In [2]: import datetime > In [3]: import pyarrow as pa > In [4]: pa.__version__ > Out[4]: '0.10.0' > In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], > dtype='datetime64[D]')) > In [6]: arr.to_pandas(date_as_object=False) > Out[6]: array(['2262-04-12'], dtype='datetime64[D]') > In [7]: pa.column('name', arr).to_pandas(date_as_object=False) > Out[7]: > 0 1677-09-21 00:25:26.290448384 > Name: name, dtype: datetime64[ns] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3978) [C++] Implement hashing, dictionary-encoding for StructArray
[ https://issues.apache.org/jira/browse/ARROW-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825451#comment-16825451 ] Jacques Nadeau commented on ARROW-3978: --- Here is some info about what we found worked well. Note that it doesn't go into a lot of detail about the pivot algorithm beyond the basic concepts of fixed and variable vectors. [https://docs.google.com/document/d/1Yk6IvDL28IzEjqcqSkFdevRyMrC8_kwzEatHvcOnawM/edit] Main idea around pivot: * separate fixed and variable and have each continguous * coalesce bits for nullability and values together at the start of the data structure (save space, increase likelihood of mismatch early) * include length of variable in fixed container to reduce likelihood of jumping to variable container. * Have specialized cases that look at actual existence of nulls for each word and fork behavior based on that to improve performance of common case where things are mostly null or not null. The latest code for the Arrow pivot algorithms specifically that we use can be found here: Pivots: [https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Pivots.java] Unpivots: [https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/Unpivots.java] Hash Table: [https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/common/ht2/LBlockHashTable.java] We'd be happy to donate this code/algo to the community as it would probably serve as a good foundation. Note the doc is probably somewhat out of date with the actual implementation as it was written early on in development. > [C++] Implement hashing, dictionary-encoding for StructArray > > > Key: ARROW-3978 > URL: https://issues.apache.org/jira/browse/ARROW-3978 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > This is a central requirement for hash-aggregations such as > {code} > SELECT AGG_FUNCTION(expr) > FROM table > GROUP BY expr1, expr2, ... > {code} > The materialized keys in the GROUP BY section form a struct, which can be > incrementally hashed to produce dictionary codes suitable for computing > aggregates or any other purpose. > There are a few subtasks related to this, such as efficiently constructing a > record (that can be hashed quickly) to identify each "row" in the struct. > Maybe we should start with that first -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4717) [C#] Consider exposing ValueTask instead of Task
[ https://issues.apache.org/jira/browse/ARROW-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Erhardt reassigned ARROW-4717: --- Assignee: Eric Erhardt > [C#] Consider exposing ValueTask instead of Task > > > Key: ARROW-4717 > URL: https://issues.apache.org/jira/browse/ARROW-4717 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Reporter: Eric Erhardt >Assignee: Eric Erhardt >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > See [https://github.com/apache/arrow/pull/3736#pullrequestreview-207169204] > for the discussion and > [https://devblogs.microsoft.com/dotnet/understanding-the-whys-whats-and-whens-of-valuetask/] > for the reasoning. > Using `Task` in public API requires that a new Task instance be allocated > on every call. When returning synchronously, using ValueTask will allow the > method to not allocate. > In order to do this, we will need to take a new dependency on > {{System.Threading.Tasks.Extensions}} NuGet package. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4935) [C++] Errors from jemalloc when building pyarrow from source on OSX and Debian
[ https://issues.apache.org/jira/browse/ARROW-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825464#comment-16825464 ] Ian Mateus Vieira Manor commented on ARROW-4935: Having the same problem running {code:java} cmake install{code} on OSX, but have no {code:java} /build/jemalloc_ep-prefix/src/jemalloc_ep/dist{code} directory to delete. > [C++] Errors from jemalloc when building pyarrow from source on OSX and Debian > -- > > Key: ARROW-4935 > URL: https://issues.apache.org/jira/browse/ARROW-4935 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.12.1 > Environment: OSX, Debian, Python==3.6.7 >Reporter: Gregory Hayes >Priority: Critical > Labels: build, newbie > > My attempts to build pyarrow from source are failing. I've set up the conda > environment using the instructions provided in the Develop instructions, and > have tried this on both Debian and OSX. When I run CMAKE in debug mode on > OSX, the output is: > {code:java} > -- Building using CMake version: 3.14.0 > -- Arrow version: 0.13.0 (full: '0.13.0-SNAPSHOT') > -- clang-tidy not found > -- clang-format not found > -- infer found at /usr/local/bin/infer > -- Using ccache: /usr/local/bin/ccache > -- Found cpplint executable at > /Users/Greg/documents/repos/arrow/cpp/build-support/cpplint.py > -- Compiler command: env LANG=C > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ > -v > -- Compiler version: Apple LLVM version 10.0.0 (clang-1000.11.45.5) > Target: x86_64-apple-darwin18.2.0 > Thread model: posix > InstalledDir: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin > -- Compiler id: AppleClang > Selected compiler clang 4.1.0svn > -- Arrow build warning level: CHECKIN > Configured for DEBUG build (set with cmake > -DCMAKE_BUILD_TYPE={release,debug,...}) > -- Build Type: DEBUG > -- BOOST_VERSION: 1.67.0 > -- BROTLI_VERSION: v0.6.0 > -- CARES_VERSION: 1.15.0 > -- DOUBLE_CONVERSION_VERSION: v3.1.1 > -- FLATBUFFERS_VERSION: v1.10.0 > -- GBENCHMARK_VERSION: v1.4.1 > -- GFLAGS_VERSION: v2.2.0 > -- GLOG_VERSION: v0.3.5 > -- GRPC_VERSION: v1.18.0 > -- GTEST_VERSION: 1.8.1 > -- JEMALLOC_VERSION: 17c897976c60b0e6e4f4a365c751027244dada7a > -- LZ4_VERSION: v1.8.3 > -- ORC_VERSION: 1.5.4 > -- PROTOBUF_VERSION: v3.6.1 > -- RAPIDJSON_VERSION: v1.1.0 > -- RE2_VERSION: 2018-10-01 > -- SNAPPY_VERSION: 1.1.3 > -- THRIFT_VERSION: 0.11.0 > -- ZLIB_VERSION: 1.2.8 > -- ZSTD_VERSION: v1.3.7 > -- Boost version: 1.68.0 > -- Found the following Boost libraries: > -- regex > -- system > -- filesystem > -- Boost include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- Boost libraries: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_regex.dylib/Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_system.dylib/Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_filesystem.dylib > Added shared library dependency boost_system_shared: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_system.dylib > Added shared library dependency boost_filesystem_shared: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_filesystem.dylib > Added shared library dependency boost_regex_shared: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libboost_regex.dylib > Added static library dependency double-conversion_static: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libdouble-conversion.a > -- double-conversion include dir: > /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- double-conversion static library: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libdouble-conversion.a > -- GFLAGS_HOME: /Users/Greg/anaconda3/envs/pyarrow-dev > -- GFlags include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- GFlags static library: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libgflags.a > Added static library dependency gflags_static: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libgflags.a > -- RapidJSON include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- Found the Flatbuffers library: > /Users/Greg/anaconda3/envs/pyarrow-dev/lib/libflatbuffers.a > -- Flatbuffers include dir: /Users/Greg/anaconda3/envs/pyarrow-dev/include > -- Flatbuffers compiler: /Users/Greg/anaconda3/envs/pyarrow-dev/bin/flatc > Added static library dependency jemalloc_static: > /Users/Greg/documents/repos/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a > Added shared library dependency jemalloc_shared: > /Users/Greg/documents/repos/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc.dylib > -- Found hdfs.h at: > /Users/Greg/documents/repos/arrow/cpp/thirdparty/hadoop/include/hdfs.h > -- Found the ZLIB shared library: >
[jira] [Assigned] (ARROW-5207) [Java] add APIs to support vector reuse
[ https://issues.apache.org/jira/browse/ARROW-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-5207: - Assignee: Ji Liu > [Java] add APIs to support vector reuse > --- > > Key: ARROW-5207 > URL: https://issues.apache.org/jira/browse/ARROW-5207 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > > In some scenarios we hope that ValueVector could be reused to reduce creation > overhead. This is very common in shuffle stage, it's no need to create > ValueVector or realloc buffers every time, suppose that the recordCount of > ValueVector and capacity of its buffers is written in stream, when we > deserialize it, we can simply judge whether realloc is needed through > dataLength. > My proposal is that add APIs in ValueVector to process this logic, otherwise > users have to implement by themselves if they want to reuse which is not > user-friendly. > If you agree with this, I would like to take this ticket. Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5206) [Java] Add APIs in MessageSerializer to directly serialize/deserialize ArrowBuf
[ https://issues.apache.org/jira/browse/ARROW-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ji Liu reassigned ARROW-5206: - Assignee: Ji Liu > [Java] Add APIs in MessageSerializer to directly serialize/deserialize > ArrowBuf > --- > > Key: ARROW-5206 > URL: https://issues.apache.org/jira/browse/ARROW-5206 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > > It seems there no APIs to directly write ArrowBuf to OutputStream or read > ArrowBuf from InputStream. These APIs may be helpful when users use Vectors > directly instead of RecordBatch, in this case, provide APIs to > serialize/deserialize dataBuffer/validityBuffer/offsetBuffer is necessary. > I would like to work on this and make it my first contribution to Arrow. What > do you think? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5071) [Benchmarking] Performs a benchmark run with archery
[ https://issues.apache.org/jira/browse/ARROW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-5071: -- Description: Run all regression benchmarks, consume output and re-format according to the format required by dev/benchmarking specification and/or push to upstream database. This would be implemented as `archery benchmark run`. Provide facility to save/load results as a StaticRunner (such that it can be re-used in comparison without running the benchmark again). was: Run all regression benchmarks, consume output and re-format according to the format required by dev/benchmarking specification. This would be implemented as `archery benchmark run`. Provide facility to save/load results as a StaticRunner (such that it can be re-used in comparison without running the benchmark again). > [Benchmarking] Performs a benchmark run with archery > > > Key: ARROW-5071 > URL: https://issues.apache.org/jira/browse/ARROW-5071 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Run all regression benchmarks, consume output and re-format according to the > format required by dev/benchmarking specification and/or push to upstream > database. > This would be implemented as `archery benchmark run`. Provide facility to > save/load results as a StaticRunner (such that it can be re-used in > comparison without running the benchmark again). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5125) [Python] Cannot roundtrip extreme dates through pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5125: - Labels: parquet windows (was: parquet) > [Python] Cannot roundtrip extreme dates through pyarrow > --- > > Key: ARROW-5125 > URL: https://issues.apache.org/jira/browse/ARROW-5125 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 > Environment: Windows 10, Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 > 2019, 22:22:05) >Reporter: Max Bolingbroke >Priority: Major > Labels: parquet, windows > Fix For: 0.14.0 > > > You can roundtrip many dates through a pyarrow array: > > {noformat} > >>> pa.array([datetime.date(1980, 1, 1)], type=pa.date32())[0] > datetime.date(1980, 1, 1){noformat} > > But (on Windows at least), not extreme ones: > > {noformat} > >>> pa.array([datetime.date(1960, 1, 1)], type=pa.date32())[0] > Traceback (most recent call last): > File "", line 1, in > File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__ > File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py > OSError: [Errno 22] Invalid argument > >>> pa.array([datetime.date(3200, 1, 1)], type=pa.date32())[0] > Traceback (most recent call last): > File "", line 1, in > File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__ > File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py > {noformat} > This is because datetime.utcfromtimestamp and datetime.timestamp fail on > these dates, but it seems we should be able to totally avoid invoking this > function when deserializing dates. Ideally we would be able to roundtrip > these as datetimes too, of course, but it's less clear that this will be > easy. For some context on this see [https://bugs.python.org/issue29097]. > This may be related to ARROW-3176 and ARROW-4746 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5209) [Java] Add performance benchmarks from SQL workloads
Liya Fan created ARROW-5209: --- Summary: [Java] Add performance benchmarks from SQL workloads Key: ARROW-5209 URL: https://issues.apache.org/jira/browse/ARROW-5209 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Liya Fan Assignee: Liya Fan To improve the performance of Arrow implementations. Some performance benchmarks must be setup first. In this issue, we want to provide some performance benchmarks extracted from our SQL engine, which is going to be made open source soon. The workloads are obtained by running an open SQL benchmarks TPC-H. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5208) [Python] Inconsistent resulting type during casting in pa.array() when mask is present
[ https://issues.apache.org/jira/browse/ARROW-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5208: -- Summary: [Python] Inconsistent resulting type during casting in pa.array() when mask is present (was: Inconsistent resulting type during casting in pa.array() when mask is present) > [Python] Inconsistent resulting type during casting in pa.array() when mask > is present > -- > > Key: ARROW-5208 > URL: https://issues.apache.org/jira/browse/ARROW-5208 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 >Reporter: Artem KOZHEVNIKOV >Priority: Major > > I would expect Int64Array type in all cases below : > {code:java} > >>> pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) > >>> > >>> [ 4, null, > >>> 4, null ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3176) [Python] Overflow in Date32 column conversion to pandas
[ https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3176: - Description: When converting an arrow column holding a {{Date32Array}} to {{pandas}} there seems to be an overflow at the date {{2262-04-12}} such that the type and value are wrong. The issue only occurs for columns, not for arrays. Running on debian 9.5 w/ python2 gives {code} In [1]: import numpy as np In [2]: import datetime In [3]: import pyarrow as pa In [4]: pa.__version__ Out[4]: '0.10.0' In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], dtype='datetime64[D]')) In [6]: arr.to_pandas(date_as_object=False) Out[6]: array(['2262-04-12'], dtype='datetime64[D]') In [7]: pa.column('name', arr).to_pandas(date_as_object=False) Out[7]: 0 1677-09-21 00:25:26.290448384 Name: name, dtype: datetime64[ns] {code} was: When converting an arrow column holding a {{Date32Array}} to {{pandas}} there seems to be an overflow at the date {{2262-04-12}} such that the type and value are wrong. The issue only occurs for columns, not for arrays. Running on debian 9.5 w/ python2 gives {code} In [1]: import numpy as np In [2]: import datetime In [3]: import pyarrow as pa In [4]: pa.__version__ Out[4]: '0.10.0' In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], dtype='datetime64[D]')) In [6]: arr.to_pandas() Out[6]: array(['2262-04-12'], dtype='datetime64[D]') In [7]: pa.column('name', arr).to_pandas() Out[7]: 0 1677-09-21 00:25:26.290448384 Name: name, dtype: datetime64[ns] {code} > [Python] Overflow in Date32 column conversion to pandas > --- > > Key: ARROW-3176 > URL: https://issues.apache.org/jira/browse/ARROW-3176 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0 >Reporter: Florian Jetter >Priority: Minor > Fix For: 0.14.0 > > > When converting an arrow column holding a {{Date32Array}} to {{pandas}} there > seems to be an overflow at the date {{2262-04-12}} such that the type and > value are wrong. The issue only occurs for columns, not for arrays. > Running on debian 9.5 w/ python2 gives > > {code} > In [1]: import numpy as np > In [2]: import datetime > In [3]: import pyarrow as pa > In [4]: pa.__version__ > Out[4]: '0.10.0' > In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], > dtype='datetime64[D]')) > In [6]: arr.to_pandas(date_as_object=False) > Out[6]: array(['2262-04-12'], dtype='datetime64[D]') > In [7]: pa.column('name', arr).to_pandas(date_as_object=False) > Out[7]: > 0 1677-09-21 00:25:26.290448384 > Name: name, dtype: datetime64[ns] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5165) [Python][Documentation] Build docs don't suggest assigning $ARROW_BUILD_TYPE
[ https://issues.apache.org/jira/browse/ARROW-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-5165: - Assignee: Joris Van den Bossche (was: Rok Mihevc) > [Python][Documentation] Build docs don't suggest assigning $ARROW_BUILD_TYPE > > > Key: ARROW-5165 > URL: https://issues.apache.org/jira/browse/ARROW-5165 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools, Documentation, Python >Affects Versions: 0.14.0 >Reporter: Rok Mihevc >Assignee: Joris Van den Bossche >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > [Build documentation|https://arrow.apache.org/docs/developers/python.html] is > great. However it does not explicitly suggest assigning a value to > `ARROW_BUILD_TYPE` and the error thrown is not obvious: > {code:bash} > ... > [100%] Built target _parquet > – Finished cmake --build for pyarrow > Bundling includes: include > error: [Errno 2] No such file or directory: 'include' > {code} > This cost me a couple of hours to debug. > Could we include a note in [build > documentation|https://arrow.apache.org/docs/developers/python.html] > suggesting devs to run: > {code:bash} > export ARROW_BUILD_TYPE=release > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5165) [Python][Documentation] Build docs don't suggest assigning $ARROW_BUILD_TYPE
[ https://issues.apache.org/jira/browse/ARROW-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5165. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4192 [https://github.com/apache/arrow/pull/4192] > [Python][Documentation] Build docs don't suggest assigning $ARROW_BUILD_TYPE > > > Key: ARROW-5165 > URL: https://issues.apache.org/jira/browse/ARROW-5165 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools, Documentation, Python >Affects Versions: 0.14.0 >Reporter: Rok Mihevc >Assignee: Rok Mihevc >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > [Build documentation|https://arrow.apache.org/docs/developers/python.html] is > great. However it does not explicitly suggest assigning a value to > `ARROW_BUILD_TYPE` and the error thrown is not obvious: > {code:bash} > ... > [100%] Built target _parquet > – Finished cmake --build for pyarrow > Bundling includes: include > error: [Errno 2] No such file or directory: 'include' > {code} > This cost me a couple of hours to debug. > Could we include a note in [build > documentation|https://arrow.apache.org/docs/developers/python.html] > suggesting devs to run: > {code:bash} > export ARROW_BUILD_TYPE=release > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5210) [Python] editable install (pip install -e .) is failing
Joris Van den Bossche created ARROW-5210: Summary: [Python] editable install (pip install -e .) is failing Key: ARROW-5210 URL: https://issues.apache.org/jira/browse/ARROW-5210 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Joris Van den Bossche Following the python development documentation on building arrow and pyarrow ([https://arrow.apache.org/docs/developers/python.html#build-and-test),] building pyarrow inplace with {{python setup.py build_ext --inplace}} works fine. But if you want to also install this inplace version in the current python environment (editable install / development install) using pip ({{pip install -e .}}), this fails during the {{built_ext}} / cmake phase: {code:none} -- Looking for python3.7m -- Found Python lib /home/joris/miniconda3/envs/arrow-dev/lib/libpython3.7m.so CMake Error at cmake_modules/FindNumPy.cmake:62 (message): NumPy import failure: Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'numpy' Call Stack (most recent call first): CMakeLists.txt:186 (find_package) -- Configuring incomplete, errors occurred! See also "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". See also "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log". error: command 'cmake' failed with exit status 1 Cleaning up... {code} Alternatively, doing `python setup.py develop` to achieve the same does work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5200) [Java] Provide light-weight arrow APIs
[ https://issues.apache.org/jira/browse/ARROW-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824900#comment-16824900 ] Liya Fan commented on ARROW-5200: - Sounds reasonable. Thanks a lot for your comments. We have opened a new Jira (ARROW-5209) to setup some performance benchmarks from our SQL engine, which is going to be made open source. The benchmarks are extracted by running an open SQL benchmark TPC-H. > [Java] Provide light-weight arrow APIs > -- > > Key: ARROW-5200 > URL: https://issues.apache.org/jira/browse/ARROW-5200 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Attachments: image-2019-04-23-15-19-34-187.png > > > We are trying to incorporate Apache Arrow to Apache Flink runtime. We find > Arrow an amazing library, which greatly simplifies the support of columnar > data format. > However, for many scenarios, we find the performance unacceptable. Our > investigation shows the reason is that, there are too many redundant checks > and computations in Arrow API. > For example, the following figures shows that in a single call to > Float8Vector.get(int) method (this is one of the most frequently used APIs in > Flink computation), there are 20+ method invocations. > !image-2019-04-23-15-19-34-187.png! > > There are many other APIs with similar problems. We believe that these checks > will make sure of the integrity of the program. However, it also impacts > performance severely. For our evaluation, the performance may degrade by two > or three orders of magnitude slower, compared to access data on heap memory. > We think at least for some scenarios, we can give the responsibility of > integrity check to application owners. If they can be sure all the checks > have been passed, we can provide some light-weight APIs and the inherent high > performance, to them. > In the light-weight APIs, we only provide minimum checks, or avoid checks at > all. The application owner can still develop and debug their code using the > original heavy-weight APIs. Once all bugs have been fixed, they can switch to > light-weight APIs in their products and enjoy the consequent high performance. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3176) [Python] Overflow in Date32 column conversion to pandas
[ https://issues.apache.org/jira/browse/ARROW-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824911#comment-16824911 ] Joris Van den Bossche commented on ARROW-3176: -- Note that the default type changed: it now gives back datetime.date objects, instead of datetime64[D]. Do by default you no longer have this problem. But, setting {{date_as_object=False}} (to have back the old behaviour), you still have the same overflow issue. Updated the original bug report to add this keyword, to keep it a reproducible example. > [Python] Overflow in Date32 column conversion to pandas > --- > > Key: ARROW-3176 > URL: https://issues.apache.org/jira/browse/ARROW-3176 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.10.0 >Reporter: Florian Jetter >Priority: Minor > Fix For: 0.14.0 > > > When converting an arrow column holding a {{Date32Array}} to {{pandas}} there > seems to be an overflow at the date {{2262-04-12}} such that the type and > value are wrong. The issue only occurs for columns, not for arrays. > Running on debian 9.5 w/ python2 gives > > {code} > In [1]: import numpy as np > In [2]: import datetime > In [3]: import pyarrow as pa > In [4]: pa.__version__ > Out[4]: '0.10.0' > In [5]: arr = pa.array(np.array([datetime.date(2262, 4, 12)], > dtype='datetime64[D]')) > In [6]: arr.to_pandas(date_as_object=False) > Out[6]: array(['2262-04-12'], dtype='datetime64[D]') > In [7]: pa.column('name', arr).to_pandas(date_as_object=False) > Out[7]: > 0 1677-09-21 00:25:26.290448384 > Name: name, dtype: datetime64[ns] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5201) [Python] Import ABCs from collections is deprecated in Python 3.7
[ https://issues.apache.org/jira/browse/ARROW-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5201. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4187 [https://github.com/apache/arrow/pull/4187] > [Python] Import ABCs from collections is deprecated in Python 3.7 > - > > Key: ARROW-5201 > URL: https://issues.apache.org/jira/browse/ARROW-5201 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 50m > Remaining Estimate: 0h > > From running the tests, I see a few deprecation warnings related to that on > Python 3, abstract base classes should be imported from `collections.abc` > instead of `collections`: > {code:none} > pyarrow/tests/test_array.py:808 > /home/joris/scipy/repos/arrow/python/pyarrow/tests/test_array.py:808: > DeprecationWarning: Using or importing the ABCs from 'collections' instead of > from 'collections.abc' is deprecated, and in 3.8 it will stop working > pa.struct([pa.field('a', pa.int64()), pa.field('b', pa.string())])) > pyarrow/tests/test_table.py:18 > /home/joris/scipy/repos/arrow/python/pyarrow/tests/test_table.py:18: > DeprecationWarning: Using or importing the ABCs from 'collections' instead of > from 'collections.abc' is deprecated, and in 3.8 it will stop working > from collections import OrderedDict, Iterable > pyarrow/tests/test_feather.py::TestFeatherReader::test_non_string_columns > /home/joris/scipy/repos/arrow/python/pyarrow/pandas_compat.py:294: > DeprecationWarning: Using or importing the ABCs from 'collections' instead of > from 'collections.abc' is deprecated, and in 3.8 it will stop working > elif isinstance(name, collections.Sequence):{code} > Those could be imported depending on python 2/3 in the ``pyarrow.compat`` > module. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4934) [Python] Address deprecation notice that will be a bug in Python 3.8
[ https://issues.apache.org/jira/browse/ARROW-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-4934. -- Resolution: Fixed Apparently https://issues.apache.org/jira/browse/ARROW-5201 (which is just fixed) was a duplicate of this. > [Python] Address deprecation notice that will be a bug in Python 3.8 > - > > Key: ARROW-4934 > URL: https://issues.apache.org/jira/browse/ARROW-4934 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > originally reported as https://github.com/apache/arrow/issues/3839 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5204) [C++] Improve BufferBuilder performance
[ https://issues.apache.org/jira/browse/ARROW-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5204. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4193 [https://github.com/apache/arrow/pull/4193] > [C++] Improve BufferBuilder performance > --- > > Key: ARROW-5204 > URL: https://issues.apache.org/jira/browse/ARROW-5204 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.13.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 10m > Remaining Estimate: 0h > > BufferBuilder makes a spurious memset() when extending the buffer size. > We could also tweak the overallocation strategy in Reserve(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5208) Inconsistent resulting type during casting in pa.array() when mask is present
Artem KOZHEVNIKOV created ARROW-5208: Summary: Inconsistent resulting type during casting in pa.array() when mask is present Key: ARROW-5208 URL: https://issues.apache.org/jira/browse/ARROW-5208 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.13.0 Reporter: Artem KOZHEVNIKOV I would expect Int64Array type in all cases below : {code:java} pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) [ 4, null, 4, null ] pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) [ 4, null, 4, null ] pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) [ 4, null, 4, null ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5208) Inconsistent resulting type during casting in pa.array() when mask is present
[ https://issues.apache.org/jira/browse/ARROW-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem KOZHEVNIKOV updated ARROW-5208: - Description: I would expect Int64Array type in all cases below : {code:java} >>> pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) >>> >>> [4, null, 4, null ] >>> pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) >>> >>> [4, null, 4, null ] >>> pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) >>> >>> [ 4, null, 4, >>> null ]{code} was: I would expect Int64Array type in all cases below : {code:java} pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) [ 4, null, 4, null ] pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) [ 4, null, 4, null ] pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) [ 4, null, 4, null ]{code} > Inconsistent resulting type during casting in pa.array() when mask is present > - > > Key: ARROW-5208 > URL: https://issues.apache.org/jira/browse/ARROW-5208 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 >Reporter: Artem KOZHEVNIKOV >Priority: Major > > I would expect Int64Array type in all cases below : > {code:java} > >>> pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) > >>> > >>> [ 4, null, > >>> 4, null ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5210) [Python] editable install (pip install -e .) is failing
[ https://issues.apache.org/jira/browse/ARROW-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5210: - Description: Following the python development documentation on building arrow and pyarrow ([https://arrow.apache.org/docs/developers/python.html#build-and-test),] building pyarrow inplace with {{python setup.py build_ext --inplace}} works fine. But if you want to also install this inplace version in the current python environment (editable install / development install) using pip ({{pip install -e .}}), this fails during the {{built_ext}} / cmake phase: {code:none} -- Looking for python3.7m -- Found Python lib /home/joris/miniconda3/envs/arrow-dev/lib/libpython3.7m.so CMake Error at cmake_modules/FindNumPy.cmake:62 (message): NumPy import failure: Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'numpy' Call Stack (most recent call first): CMakeLists.txt:186 (find_package) -- Configuring incomplete, errors occurred! See also "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". See also "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log". error: command 'cmake' failed with exit status 1 Cleaning up... {code} Alternatively, doing {{python setup.py develop}} to achieve the same still works. was: Following the python development documentation on building arrow and pyarrow ([https://arrow.apache.org/docs/developers/python.html#build-and-test),] building pyarrow inplace with {{python setup.py build_ext --inplace}} works fine. But if you want to also install this inplace version in the current python environment (editable install / development install) using pip ({{pip install -e .}}), this fails during the {{built_ext}} / cmake phase: {code:none} -- Looking for python3.7m -- Found Python lib /home/joris/miniconda3/envs/arrow-dev/lib/libpython3.7m.so CMake Error at cmake_modules/FindNumPy.cmake:62 (message): NumPy import failure: Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'numpy' Call Stack (most recent call first): CMakeLists.txt:186 (find_package) -- Configuring incomplete, errors occurred! See also "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". See also "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log". error: command 'cmake' failed with exit status 1 Cleaning up... {code} Alternatively, doing `python setup.py develop` to achieve the same does work. > [Python] editable install (pip install -e .) is failing > > > Key: ARROW-5210 > URL: https://issues.apache.org/jira/browse/ARROW-5210 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Minor > > Following the python development documentation on building arrow and pyarrow > ([https://arrow.apache.org/docs/developers/python.html#build-and-test),] > building pyarrow inplace with {{python setup.py build_ext --inplace}} works > fine. > > But if you want to also install this inplace version in the current python > environment (editable install / development install) using pip ({{pip install > -e .}}), this fails during the {{built_ext}} / cmake phase: > {code:none} > > -- Looking for python3.7m > -- Found Python lib > /home/joris/miniconda3/envs/arrow-dev/lib/libpython3.7m.so > CMake Error at cmake_modules/FindNumPy.cmake:62 (message): > NumPy import failure: > Traceback (most recent call last): > File "", line 1, in > ModuleNotFoundError: No module named 'numpy' > Call Stack (most recent call first): > CMakeLists.txt:186 (find_package) > -- Configuring incomplete, errors occurred! > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". > See also > "/home/joris/scipy/repos/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeError.log". > error: command 'cmake' failed with exit status 1 > Cleaning up... > {code} > > Alternatively, doing {{python setup.py develop}} to achieve the same still > works. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3767) [C++] Add cast for Null to any type
[ https://issues.apache.org/jira/browse/ARROW-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-3767: - Assignee: Antoine Pitrou > [C++] Add cast for Null to any type > --- > > Key: ARROW-3767 > URL: https://issues.apache.org/jira/browse/ARROW-3767 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Uwe L. Korn >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.14.0 > > > Casting a column from NullType to any other type is possible as the resulting > array will also be all-null but simply with a different type annotation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3767) [C++] Add cast for Null to any type
[ https://issues.apache.org/jira/browse/ARROW-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3767: -- Labels: pull-request-available (was: ) > [C++] Add cast for Null to any type > --- > > Key: ARROW-3767 > URL: https://issues.apache.org/jira/browse/ARROW-3767 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Uwe L. Korn >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Casting a column from NullType to any other type is possible as the resulting > array will also be all-null but simply with a different type annotation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)