[jira] [Updated] (ARROW-5217) [Rust] [CI] DataFusion test failure
[ https://issues.apache.org/jira/browse/ARROW-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5217: -- Labels: pull-request-available (was: ) > [Rust] [CI] DataFusion test failure > --- > > Key: ARROW-5217 > URL: https://issues.apache.org/jira/browse/ARROW-5217 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Rust, Rust - DataFusion >Reporter: Antoine Pitrou >Assignee: Andy Grove >Priority: Blocker > Labels: pull-request-available > > Travis-CI Rust jobs have started failing consistently with a DataFusion test > failure. > Example here: > https://travis-ci.org/apache/arrow/jobs/524542965 > {code} > > execution::aggregate::tests::test_min_max_sum_count_avg_f64_group_by_uint32 > stdout > thread > 'execution::aggregate::tests::test_min_max_sum_count_avg_f64_group_by_uint32' > panicked at 'assertion failed: `(left == right)` > left: `2`, > right: `5`', datafusion/src/execution/aggregate.rs:1437:9 > note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5176) [Python] Automate formatting of python files
[ https://issues.apache.org/jira/browse/ARROW-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827299#comment-16827299 ] Neal Richardson commented on ARROW-5176: FTR running black the first time would touch a lot of files, so this would require a bit of work resolving merge conflicts with some open PRs: {code:java} All done! ✨ ✨ 62 files reformatted, 11 files left unchanged. {code} > [Python] Automate formatting of python files > > > Key: ARROW-5176 > URL: https://issues.apache.org/jira/browse/ARROW-5176 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Benjamin Kietzman >Priority: Minor > > [Black](https://github.com/ambv/black) is a tool for automatically formatting > python code in ways which flake8 and our other linters approve of. Adding it > to the project will allow more reliably formatted python code and fill a > similar role to {{clang-format}} for c++ and {{cmake-format}} for cmake -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4694) [CI] detect-changes.py is inconsistent
[ https://issues.apache.org/jira/browse/ARROW-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827271#comment-16827271 ] Neal Richardson commented on ARROW-4694: I encountered this too. [This PR|https://github.com/apache/arrow/pull/4210] altered two files in the `python` directory; however, all Travis builds ran, and the Rust build failed. According to the [Rust job log|https://travis-ci.org/apache/arrow/jobs/524779520], it thought that two other files were modified: {code:java} $ eval `python $TRAVIS_BUILD_DIR/ci/detect-changes.py` Affected files: [u'ci/conda_env_sphinx.yml', u'cpp/cmake_modules/ThirdpartyToolchain.cmake', u'python/pyarrow/error.pxi', u'python/pyarrow/tests/test_csv.py'] Affected topics: {'c_glib': True, 'cpp': True, 'csharp': True, 'dev': True, 'docs': True, 'go': True, 'integration': True, 'java': True, 'js': True, 'python': True, 'r': True, 'ruby': True, 'rust': True, 'site': True} {code} > [CI] detect-changes.py is inconsistent > -- > > Key: ARROW-4694 > URL: https://issues.apache.org/jira/browse/ARROW-4694 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration >Affects Versions: 0.12.1 >Reporter: Francois Saint-Jacques >Priority: Major > Labels: travis-ci > Fix For: 0.14.0 > > > Some examples of pull-requests with wrong affected files: > - [pr-3762|https://github.com/apache/arrow/pull/3762/files] shouldn't > trigger [javascript|https://travis-ci.org/apache/arrow/jobs/498805479#L217] > - [pr-3767|https://github.com/apache/arrow/pull/3767/files] shouldn't > affect files found in > [rust|https://travis-ci.org/apache/arrow/jobs/499122044] and > [javascript|https://travis-ci.org/apache/arrow/jobs/499122041#L217] > In > [get_travis_commit_range|https://github.com/apache/arrow/blob/master/ci/detect-changes.py#L63-L67] > , it references the following > [comment|https://github.com/travis-ci/travis-ci/issues/4596#issuecomment-139811122]. > If read further down in the > [thread|https://github.com/travis-ci/travis-ci/issues/4596#issuecomment-434532772], > you'll note that it can go bonkers due to shallowness and commit of branch > creation. I'm not sure if this is the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5176) [Python] Automate formatting of python files
[ https://issues.apache.org/jira/browse/ARROW-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827264#comment-16827264 ] Neal Richardson commented on ARROW-5176: +1 for this, and I'll go further and propose a pre-commit hook to run `black` so that developers don't have to waste energy thinking about linting. At a minimum we should add a pre-commit hook that runs flake8 (per the [dev instructions|https://github.com/apache/arrow/blob/master/docs/source/developers/python.rst#coding-style]). I got a Travis failure for linting, and IMO that should never happen. > [Python] Automate formatting of python files > > > Key: ARROW-5176 > URL: https://issues.apache.org/jira/browse/ARROW-5176 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Benjamin Kietzman >Priority: Minor > > [Black](https://github.com/ambv/black) is a tool for automatically formatting > python code in ways which flake8 and our other linters approve of. Adding it > to the project will allow more reliably formatted python code and fill a > similar role to {{clang-format}} for c++ and {{cmake-format}} for cmake -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4963) [C++] MSVC build invokes CMake repeatedly
[ https://issues.apache.org/jira/browse/ARROW-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4963: Fix Version/s: (was: 0.14.0) > [C++] MSVC build invokes CMake repeatedly > - > > Key: ARROW-4963 > URL: https://issues.apache.org/jira/browse/ARROW-4963 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > > I'm doing a pretty vanilla out of source build with Visual Studio 2015 and I > am finding that it's re-running CMake many times throughout the build. I will > try to produce a complete log when I can to illustrate. I am using this > command: > {code} >cmake -G "Visual Studio 14 2015 Win64" ^ > -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ > -DARROW_CXXFLAGS="/WX /MP" ^ > -DARROW_GANDIVA=on ^ > -DARROW_ORC=on ^ > -DARROW_PARQUET=on ^ > -DARROW_PYTHON=on .. >cmake --build . --target INSTALL --config Release > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4993) [C++] Display summary at the end of CMake configuration
[ https://issues.apache.org/jira/browse/ARROW-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827257#comment-16827257 ] Wes McKinney commented on ARROW-4993: - Here's how they implement https://github.com/apache/thrift/blob/71afec0ea3fc700d5f0d1c46512723963bf1e2f7/build/cmake/DefineOptions.cmake#L145 https://github.com/apache/thrift/blob/master/CMakeLists.txt#L130 > [C++] Display summary at the end of CMake configuration > --- > > Key: ARROW-4993 > URL: https://issues.apache.org/jira/browse/ARROW-4993 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.12.1 >Reporter: Antoine Pitrou >Priority: Minor > Fix For: 0.14.0 > > > Some third-party projects like Thrift display a nice and useful summary of > the build configuration at the end of the CMake configuration run: > https://ci.appveyor.com/project/pitrou/arrow/build/job/mgi68rvk0u5jf2s4?fullLog=true#L2325 > It may be good to have a similar thing in Arrow as well. Bonus points if, for > each configuration item, it says which CMake variable can be used to > influence it. > Something like: > {code} > -- Build ZSTD support: ON [change using ARROW_WITH_ZSTD] > -- Build BZ2 support: OFF [change using ARROW_WITH_BZ2] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5085) [Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups
[ https://issues.apache.org/jira/browse/ARROW-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5085: - Labels: parquet (was: ) > [Python/C++] Conversion of dict encoded null column fails in parquet writing > when using RowGroups > - > > Key: ARROW-5085 > URL: https://issues.apache.org/jira/browse/ARROW-5085 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: Florian Jetter >Priority: Minor > Labels: parquet > > Conversion of dict encoded null column fails in parquet writing when using > RowGroups > {code:python} > import pyarrow.parquet as pq > import pandas as pd > import pyarrow as pa > df = pd.DataFrame({"col": [None] * 100, "int": [1.0] * 100}) > df = df.astype({"col": "category"}) > table = pa.Table.from_pandas(df) > buf = pa.BufferOutputStream() > pq.write_table( > table, > buf, > version="2.0", > chunk_size=10, > ) > {code} > fails with > {{pyarrow.lib.ArrowIOError: Column 2 had 100 while previous column had 10}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5089) [C++/Python] Writing dictionary encoded columns to parquet is extremely slow when using chunk size
[ https://issues.apache.org/jira/browse/ARROW-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5089: - Labels: parquet performance (was: performance) > [C++/Python] Writing dictionary encoded columns to parquet is extremely slow > when using chunk size > -- > > Key: ARROW-5089 > URL: https://issues.apache.org/jira/browse/ARROW-5089 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: Florian Jetter >Priority: Major > Labels: parquet, performance > > Currently, there is a workaround for dict encoded columns in place to handle > writing dict encoded columns to parquet. > The workaround converts the dict encoded array to its plain version before > writing to parquet. This is painfully slow since for every row group the > entire array is converted over and over again. > The following example is orders of magnitude slower than the non-dict encoded > version: > {code} > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > df = pd.DataFrame({"col": ["A", "B"] * 10}).astype("category") > table = pa.Table.from_pandas(df) > buf = pa.BufferOutputStream() > pq.write_table( > table, > buf, > chunk_size=100, > ) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5222) [Python] Issues with installing pyarrow for development on MacOS
Neal Richardson created ARROW-5222: -- Summary: [Python] Issues with installing pyarrow for development on MacOS Key: ARROW-5222 URL: https://issues.apache.org/jira/browse/ARROW-5222 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Python Reporter: Neal Richardson Fix For: 0.14.0 I tried following the [instructions|https://github.com/apache/arrow/blob/master/docs/source/developers/python.rst] for installing pyarrow for developers on macos, and I ran into quite a bit of difficulty. I'm hoping we can improve our documentation and/or tooling to make this a smoother process. I know we can't anticipate every quirk of everyone's dev environment, but in my case, I was getting set up on a new machine, so this was from a clean slate. I'm also new to contributing to the project, so I'm a "clean slate" in that regard too, so my ignorance may be exposing other assumptions in the docs. # The instructions recommend using conda, but as this [Stack Overflow question|https://stackoverflow.com/questions/55798166/cmake-fails-with-when-attempting-to-compile-simple-test-program] notes, cmake fails. Uwe helpfully suggested installing an older MacOS SDK from [here|https://github.com/phracker/MacOSX-SDKs/releases]. That may work, but I'm personally wary to install binaries from an unofficial github account, let alone record that in our docs as an official recommendation. Either way, we should update the docs either to note this necessity or to recommend against installing with conda on macos. # After that, I tried to go the Homebrew path. Ultimately this did succeed, but it was rough. It seemed that I had to `brew install` a lot of packages that weren't included in the arrow/python/Brewfile (i.e. try to cmake, see what missing dependency it failed on, `brew install` it, retry `cmake`, and repeat). Among the libs I installed this way were double-conversion snappy brotli protobuf gtest rapidjson flatbuffers lz4 zstd c-ares boost. It's not clear how many of these extra dependencies I had to install were because I'd only installed the xcode command-line tools and not the full xcode from the App Store; regardless, the Brewfile should be complete if we want to use it. # In searching Jira for the double-conversion issue (the first one I hit), I found [this issue/PR|https://github.com/apache/arrow/pull/4132/files], which added double-conversion to a different Brewfile, in c_glib. So I tried `brew bundle` installing that Brewfile. It would probably be good to have a common Brewfile for the C++ setup, which the python and glib ones could load and then add any other extra dependencies, if necessary. That way, there's one place to add common dependencies. # I got close here but still had issues with `BOOST_HOME` not being found, even though I had brew-installed it. From the console output, it appeared that even though I was not using conda and did not have an active conda environment (I'd even done `conda env remove --name pyarrow-dev`), the cmake configuration script detected that conda existed and decided to use conda to resolve dependencies. I tried setting lots of different environment variables to tell cmake not to use conda, but ultimately I was only able to get past this by deleting conda from my system entirely. # This let me get to the point of being able to `import pyarrow`. But then running tests failed because the `hypothesis` package was not installed. I see that it is included in requirements-test.txt and setup.py under tests_require, but I followed the installation instructions and this package did not end up in my virtualenv. `pip install hypothesis` resolved it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5212) [Go] Array BinaryBuilder in Go library has no access to resize the values buffer
[ https://issues.apache.org/jira/browse/ARROW-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastien Binet resolved ARROW-5212. Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4204 [https://github.com/apache/arrow/pull/4204] > [Go] Array BinaryBuilder in Go library has no access to resize the values > buffer > > > Key: ARROW-5212 > URL: https://issues.apache.org/jira/browse/ARROW-5212 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Jonathan A Sternberg >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When you are dealing with a binary builder, there are three buffers: the null > bitmap, the offset indexes, and the values buffer which contains the actual > data. > When {{Reserve}} or {{Resize}} are used, the null bitmap and the offsets are > modified to allow for additional appends to function. This seems correct to > me. There's no way to know how much the values buffer should be resized until > the values are being appended with just the number of values alone. > But, when you are then appending a bunch of string values, there's no > additional API to preallocate the size of that last buffer. That means that > batch appending a large amount of strings will constantly allocate even if > you know the size ahead of time. > There should be some additional API to modify this last buffer such as maybe > {{ReserveBytes}} and {{ResizeBytes}} that would correspond with the > {{Reserve}} and {{Resize}} methods, but would related to the values buffer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5214) [C++] Offline dependency downloader misses some libraries
[ https://issues.apache.org/jira/browse/ARROW-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5214. - Resolution: Fixed Issue resolved by pull request 4214 [https://github.com/apache/arrow/pull/4214] > [C++] Offline dependency downloader misses some libraries > - > > Key: ARROW-5214 > URL: https://issues.apache.org/jira/browse/ARROW-5214 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Francois Saint-Jacques >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Not sure yet but maybe this was introduced by > https://github.com/apache/arrow/commit/f913d8f0adff71c288a10f6c1b0ad2d1ab3e9e32 > {code} > $ thirdparty/download_dependencies.sh /home/wesm/arrow-thirdparty > # Environment variables for offline Arrow build > export ARROW_BOOST_URL=/home/wesm/arrow-thirdparty/boost-1.67.0.tar.gz > export ARROW_BROTLI_URL=/home/wesm/arrow-thirdparty/brotli-v1.0.7.tar.gz > export ARROW_CARES_URL=/home/wesm/arrow-thirdparty/cares-1.15.0.tar.gz > export > ARROW_DOUBLE_CONVERSION_URL=/home/wesm/arrow-thirdparty/double-conversion-v3.1.4.tar.gz > export > ARROW_FLATBUFFERS_URL=/home/wesm/arrow-thirdparty/flatbuffers-v1.10.0.tar.gz > export > ARROW_GBENCHMARK_URL=/home/wesm/arrow-thirdparty/gbenchmark-v1.4.1.tar.gz > export ARROW_GFLAGS_URL=/home/wesm/arrow-thirdparty/gflags-v2.2.0.tar.gz > export ARROW_GLOG_URL=/home/wesm/arrow-thirdparty/glog-v0.3.5.tar.gz > export ARROW_GRPC_URL=/home/wesm/arrow-thirdparty/grpc-v1.20.0.tar.gz > export ARROW_GTEST_URL=/home/wesm/arrow-thirdparty/gtest-1.8.1.tar.gz > export ARROW_LZ4_URL=/home/wesm/arrow-thirdparty/lz4-v1.8.3.tar.gz > export ARROW_ORC_URL=/home/wesm/arrow-thirdparty/orc-1.5.5.tar.gz > export ARROW_PROTOBUF_URL=/home/wesm/arrow-thirdparty/protobuf-v3.7.1.tar.gz > export > ARROW_RAPIDJSON_URL=/home/wesm/arrow-thirdparty/rapidjson-2bbd33b33217ff4a73434ebf10cdac41e2ef5e34.tar.gz > export ARROW_RE2_URL=/home/wesm/arrow-thirdparty/re2-2019-04-01.tar.gz > {code} > The 5 dependencies listed after RE2 are not downloaded -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5130) [Python] Segfault when importing TensorFlow after Pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826997#comment-16826997 ] Wes McKinney commented on ARROW-5130: - See https://github.com/apache/arrow/tree/master/python/manylinux1 > [Python] Segfault when importing TensorFlow after Pyarrow > - > > Key: ARROW-5130 > URL: https://issues.apache.org/jira/browse/ARROW-5130 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Travis Addair >Priority: Major > > This issue is similar to https://jira.apache.org/jira/browse/ARROW-2657 which > was fixed in v0.10.0. > When we import TensorFlow after Pyarrow in Linux Debian Jessie, we get a > segfault. To reproduce: > {code:java} > import pyarrow > import tensorflow{code} > Here's the backtrace from gdb: > {code:java} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x in ?? () > (gdb) bt > #0 0x in ?? () > #1 0x7f529ee04410 in pthread_once () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103 > #2 0x7f5229a74efa in void std::call_once(std::once_flag&, > void (&)()) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #3 0x7f5229a74f3e in > tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #4 0x7f522978b561 in tensorflow::port::(anonymous > namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string > const&) () > from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #5 0x7f522978b5b4 in _GLOBAL__sub_I_cpu_feature_guard.cc () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #6 0x7f529f224bea in call_init (l=, argc=argc@entry=9, > argv=argv@entry=0x7ffc6d8c1488, env=env@entry=0x294c0c0) at dl-init.c:78 > #7 0x7f529f224cd3 in call_init (env=0x294c0c0, argv=0x7ffc6d8c1488, > argc=9, l=) at dl-init.c:36 > #8 _dl_init (main_map=main_map@entry=0x2e4aff0, argc=9, argv=0x7ffc6d8c1488, > env=0x294c0c0) at dl-init.c:126 > #9 0x7f529f228e38 in dl_open_worker (a=a@entry=0x7ffc6d8bebb8) at > dl-open.c:577 > #10 0x7f529f224aa4 in _dl_catch_error > (objname=objname@entry=0x7ffc6d8beba8, > errstring=errstring@entry=0x7ffc6d8bebb0, > mallocedp=mallocedp@entry=0x7ffc6d8beba7, > operate=operate@entry=0x7f529f228b60 , > args=args@entry=0x7ffc6d8bebb8) at dl-error.c:187 > #11 0x7f529f22862b in _dl_open (file=0x7f5248178b54 > "/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so", > mode=-2147483646, caller_dlopen=, > nsid=-2, argc=9, argv=0x7ffc6d8c1488, env=0x294c0c0) at dl-open.c:661 > #12 0x7f529ebf402b in dlopen_doit (a=a@entry=0x7ffc6d8bedd0) at > dlopen.c:66 > #13 0x7f529f224aa4 in _dl_catch_error (objname=0x2950fc0, > errstring=0x2950fc8, mallocedp=0x2950fb8, operate=0x7f529ebf3fd0 > , args=0x7ffc6d8bedd0) at dl-error.c:187 > #14 0x7f529ebf45dd in _dlerror_run (operate=operate@entry=0x7f529ebf3fd0 > , args=args@entry=0x7ffc6d8bedd0) at dlerror.c:163 > #15 0x7f529ebf40c1 in __dlopen (file=, mode= out>) at dlopen.c:87 > #16 0x00540859 in _PyImport_GetDynLoadFunc () > #17 0x0054024c in _PyImport_LoadDynamicModule () > #18 0x005f2bcb in ?? () > #19 0x004ca235 in PyEval_EvalFrameEx () > #20 0x004ca9c2 in PyEval_EvalFrameEx () > #21 0x004c8c39 in PyEval_EvalCodeEx () > #22 0x004c84e6 in PyEval_EvalCode () > #23 0x004c6e5c in PyImport_ExecCodeModuleEx () > #24 0x004c3272 in ?? () > #25 0x004b19e2 in ?? () > #26 0x004b13d7 in ?? () > #27 0x004b42f6 in ?? () > #28 0x004d1aab in PyEval_CallObjectWithKeywords () > #29 0x004ccdb3 in PyEval_EvalFrameEx () > #30 0x004c8c39 in PyEval_EvalCodeEx () > #31 0x004c84e6 in PyEval_EvalCode () > #32 0x004c6e5c in PyImport_ExecCodeModuleEx () > #33 0x004c3272 in ?? () > #34 0x004b1d3f in ?? () > #35 0x004b6b2b in ?? () > #36 0x004b0d82 in ?? () > #37 0x004b42f6 in ?? () > #38 0x004d1aab in PyEval_CallObjectWithKeywords () > #39 0x004ccdb3 in PyEval_EvalFrameEx (){code} > It looks like the code changes that fixed the previous issue was recently > removed in > [https://github.com/apache/arrow/commit/b766bff34b7d85034d26cebef5b3aeef1eb2fd82#diff-16806bcebc1df2fae432db426905b9f0]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5214) [C++] Offline dependency downloader misses some libraries
[ https://issues.apache.org/jira/browse/ARROW-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5214: -- Labels: pull-request-available (was: ) > [C++] Offline dependency downloader misses some libraries > - > > Key: ARROW-5214 > URL: https://issues.apache.org/jira/browse/ARROW-5214 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Francois Saint-Jacques >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Not sure yet but maybe this was introduced by > https://github.com/apache/arrow/commit/f913d8f0adff71c288a10f6c1b0ad2d1ab3e9e32 > {code} > $ thirdparty/download_dependencies.sh /home/wesm/arrow-thirdparty > # Environment variables for offline Arrow build > export ARROW_BOOST_URL=/home/wesm/arrow-thirdparty/boost-1.67.0.tar.gz > export ARROW_BROTLI_URL=/home/wesm/arrow-thirdparty/brotli-v1.0.7.tar.gz > export ARROW_CARES_URL=/home/wesm/arrow-thirdparty/cares-1.15.0.tar.gz > export > ARROW_DOUBLE_CONVERSION_URL=/home/wesm/arrow-thirdparty/double-conversion-v3.1.4.tar.gz > export > ARROW_FLATBUFFERS_URL=/home/wesm/arrow-thirdparty/flatbuffers-v1.10.0.tar.gz > export > ARROW_GBENCHMARK_URL=/home/wesm/arrow-thirdparty/gbenchmark-v1.4.1.tar.gz > export ARROW_GFLAGS_URL=/home/wesm/arrow-thirdparty/gflags-v2.2.0.tar.gz > export ARROW_GLOG_URL=/home/wesm/arrow-thirdparty/glog-v0.3.5.tar.gz > export ARROW_GRPC_URL=/home/wesm/arrow-thirdparty/grpc-v1.20.0.tar.gz > export ARROW_GTEST_URL=/home/wesm/arrow-thirdparty/gtest-1.8.1.tar.gz > export ARROW_LZ4_URL=/home/wesm/arrow-thirdparty/lz4-v1.8.3.tar.gz > export ARROW_ORC_URL=/home/wesm/arrow-thirdparty/orc-1.5.5.tar.gz > export ARROW_PROTOBUF_URL=/home/wesm/arrow-thirdparty/protobuf-v3.7.1.tar.gz > export > ARROW_RAPIDJSON_URL=/home/wesm/arrow-thirdparty/rapidjson-2bbd33b33217ff4a73434ebf10cdac41e2ef5e34.tar.gz > export ARROW_RE2_URL=/home/wesm/arrow-thirdparty/re2-2019-04-01.tar.gz > {code} > The 5 dependencies listed after RE2 are not downloaded -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5214) [C++] Offline dependency downloader misses some libraries
[ https://issues.apache.org/jira/browse/ARROW-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826979#comment-16826979 ] Francois Saint-Jacques edited comment on ARROW-5214 at 4/26/19 1:59 PM: The script is exiting silently, but with a non-zero error code. I'll fix this. The real issue is that this snappy version url (change path) does not exists anymore. was (Author: fsaintjacques): The script is exiting silently, but with a non-zero error code. I'll fix this. The real issue is that this snappy version does not exists anymore. > [C++] Offline dependency downloader misses some libraries > - > > Key: ARROW-5214 > URL: https://issues.apache.org/jira/browse/ARROW-5214 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Francois Saint-Jacques >Priority: Major > Fix For: 0.14.0 > > > Not sure yet but maybe this was introduced by > https://github.com/apache/arrow/commit/f913d8f0adff71c288a10f6c1b0ad2d1ab3e9e32 > {code} > $ thirdparty/download_dependencies.sh /home/wesm/arrow-thirdparty > # Environment variables for offline Arrow build > export ARROW_BOOST_URL=/home/wesm/arrow-thirdparty/boost-1.67.0.tar.gz > export ARROW_BROTLI_URL=/home/wesm/arrow-thirdparty/brotli-v1.0.7.tar.gz > export ARROW_CARES_URL=/home/wesm/arrow-thirdparty/cares-1.15.0.tar.gz > export > ARROW_DOUBLE_CONVERSION_URL=/home/wesm/arrow-thirdparty/double-conversion-v3.1.4.tar.gz > export > ARROW_FLATBUFFERS_URL=/home/wesm/arrow-thirdparty/flatbuffers-v1.10.0.tar.gz > export > ARROW_GBENCHMARK_URL=/home/wesm/arrow-thirdparty/gbenchmark-v1.4.1.tar.gz > export ARROW_GFLAGS_URL=/home/wesm/arrow-thirdparty/gflags-v2.2.0.tar.gz > export ARROW_GLOG_URL=/home/wesm/arrow-thirdparty/glog-v0.3.5.tar.gz > export ARROW_GRPC_URL=/home/wesm/arrow-thirdparty/grpc-v1.20.0.tar.gz > export ARROW_GTEST_URL=/home/wesm/arrow-thirdparty/gtest-1.8.1.tar.gz > export ARROW_LZ4_URL=/home/wesm/arrow-thirdparty/lz4-v1.8.3.tar.gz > export ARROW_ORC_URL=/home/wesm/arrow-thirdparty/orc-1.5.5.tar.gz > export ARROW_PROTOBUF_URL=/home/wesm/arrow-thirdparty/protobuf-v3.7.1.tar.gz > export > ARROW_RAPIDJSON_URL=/home/wesm/arrow-thirdparty/rapidjson-2bbd33b33217ff4a73434ebf10cdac41e2ef5e34.tar.gz > export ARROW_RE2_URL=/home/wesm/arrow-thirdparty/re2-2019-04-01.tar.gz > {code} > The 5 dependencies listed after RE2 are not downloaded -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5214) [C++] Offline dependency downloader misses some libraries
[ https://issues.apache.org/jira/browse/ARROW-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826979#comment-16826979 ] Francois Saint-Jacques commented on ARROW-5214: --- The script is exiting silently, but with a non-zero error code. I'll fix this. The real issue is that this snappy version does not exists anymore. > [C++] Offline dependency downloader misses some libraries > - > > Key: ARROW-5214 > URL: https://issues.apache.org/jira/browse/ARROW-5214 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Francois Saint-Jacques >Priority: Major > Fix For: 0.14.0 > > > Not sure yet but maybe this was introduced by > https://github.com/apache/arrow/commit/f913d8f0adff71c288a10f6c1b0ad2d1ab3e9e32 > {code} > $ thirdparty/download_dependencies.sh /home/wesm/arrow-thirdparty > # Environment variables for offline Arrow build > export ARROW_BOOST_URL=/home/wesm/arrow-thirdparty/boost-1.67.0.tar.gz > export ARROW_BROTLI_URL=/home/wesm/arrow-thirdparty/brotli-v1.0.7.tar.gz > export ARROW_CARES_URL=/home/wesm/arrow-thirdparty/cares-1.15.0.tar.gz > export > ARROW_DOUBLE_CONVERSION_URL=/home/wesm/arrow-thirdparty/double-conversion-v3.1.4.tar.gz > export > ARROW_FLATBUFFERS_URL=/home/wesm/arrow-thirdparty/flatbuffers-v1.10.0.tar.gz > export > ARROW_GBENCHMARK_URL=/home/wesm/arrow-thirdparty/gbenchmark-v1.4.1.tar.gz > export ARROW_GFLAGS_URL=/home/wesm/arrow-thirdparty/gflags-v2.2.0.tar.gz > export ARROW_GLOG_URL=/home/wesm/arrow-thirdparty/glog-v0.3.5.tar.gz > export ARROW_GRPC_URL=/home/wesm/arrow-thirdparty/grpc-v1.20.0.tar.gz > export ARROW_GTEST_URL=/home/wesm/arrow-thirdparty/gtest-1.8.1.tar.gz > export ARROW_LZ4_URL=/home/wesm/arrow-thirdparty/lz4-v1.8.3.tar.gz > export ARROW_ORC_URL=/home/wesm/arrow-thirdparty/orc-1.5.5.tar.gz > export ARROW_PROTOBUF_URL=/home/wesm/arrow-thirdparty/protobuf-v3.7.1.tar.gz > export > ARROW_RAPIDJSON_URL=/home/wesm/arrow-thirdparty/rapidjson-2bbd33b33217ff4a73434ebf10cdac41e2ef5e34.tar.gz > export ARROW_RE2_URL=/home/wesm/arrow-thirdparty/re2-2019-04-01.tar.gz > {code} > The 5 dependencies listed after RE2 are not downloaded -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5130) [Python] Segfault when importing TensorFlow after Pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826968#comment-16826968 ] Francois Saint-Jacques commented on ARROW-5130: --- You'll have to replicate https://github.com/apache/arrow/blob/master/dev/tasks/python-wheels/travis.linux.yml > [Python] Segfault when importing TensorFlow after Pyarrow > - > > Key: ARROW-5130 > URL: https://issues.apache.org/jira/browse/ARROW-5130 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Travis Addair >Priority: Major > > This issue is similar to https://jira.apache.org/jira/browse/ARROW-2657 which > was fixed in v0.10.0. > When we import TensorFlow after Pyarrow in Linux Debian Jessie, we get a > segfault. To reproduce: > {code:java} > import pyarrow > import tensorflow{code} > Here's the backtrace from gdb: > {code:java} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x in ?? () > (gdb) bt > #0 0x in ?? () > #1 0x7f529ee04410 in pthread_once () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103 > #2 0x7f5229a74efa in void std::call_once(std::once_flag&, > void (&)()) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #3 0x7f5229a74f3e in > tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #4 0x7f522978b561 in tensorflow::port::(anonymous > namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string > const&) () > from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #5 0x7f522978b5b4 in _GLOBAL__sub_I_cpu_feature_guard.cc () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #6 0x7f529f224bea in call_init (l=, argc=argc@entry=9, > argv=argv@entry=0x7ffc6d8c1488, env=env@entry=0x294c0c0) at dl-init.c:78 > #7 0x7f529f224cd3 in call_init (env=0x294c0c0, argv=0x7ffc6d8c1488, > argc=9, l=) at dl-init.c:36 > #8 _dl_init (main_map=main_map@entry=0x2e4aff0, argc=9, argv=0x7ffc6d8c1488, > env=0x294c0c0) at dl-init.c:126 > #9 0x7f529f228e38 in dl_open_worker (a=a@entry=0x7ffc6d8bebb8) at > dl-open.c:577 > #10 0x7f529f224aa4 in _dl_catch_error > (objname=objname@entry=0x7ffc6d8beba8, > errstring=errstring@entry=0x7ffc6d8bebb0, > mallocedp=mallocedp@entry=0x7ffc6d8beba7, > operate=operate@entry=0x7f529f228b60 , > args=args@entry=0x7ffc6d8bebb8) at dl-error.c:187 > #11 0x7f529f22862b in _dl_open (file=0x7f5248178b54 > "/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so", > mode=-2147483646, caller_dlopen=, > nsid=-2, argc=9, argv=0x7ffc6d8c1488, env=0x294c0c0) at dl-open.c:661 > #12 0x7f529ebf402b in dlopen_doit (a=a@entry=0x7ffc6d8bedd0) at > dlopen.c:66 > #13 0x7f529f224aa4 in _dl_catch_error (objname=0x2950fc0, > errstring=0x2950fc8, mallocedp=0x2950fb8, operate=0x7f529ebf3fd0 > , args=0x7ffc6d8bedd0) at dl-error.c:187 > #14 0x7f529ebf45dd in _dlerror_run (operate=operate@entry=0x7f529ebf3fd0 > , args=args@entry=0x7ffc6d8bedd0) at dlerror.c:163 > #15 0x7f529ebf40c1 in __dlopen (file=, mode= out>) at dlopen.c:87 > #16 0x00540859 in _PyImport_GetDynLoadFunc () > #17 0x0054024c in _PyImport_LoadDynamicModule () > #18 0x005f2bcb in ?? () > #19 0x004ca235 in PyEval_EvalFrameEx () > #20 0x004ca9c2 in PyEval_EvalFrameEx () > #21 0x004c8c39 in PyEval_EvalCodeEx () > #22 0x004c84e6 in PyEval_EvalCode () > #23 0x004c6e5c in PyImport_ExecCodeModuleEx () > #24 0x004c3272 in ?? () > #25 0x004b19e2 in ?? () > #26 0x004b13d7 in ?? () > #27 0x004b42f6 in ?? () > #28 0x004d1aab in PyEval_CallObjectWithKeywords () > #29 0x004ccdb3 in PyEval_EvalFrameEx () > #30 0x004c8c39 in PyEval_EvalCodeEx () > #31 0x004c84e6 in PyEval_EvalCode () > #32 0x004c6e5c in PyImport_ExecCodeModuleEx () > #33 0x004c3272 in ?? () > #34 0x004b1d3f in ?? () > #35 0x004b6b2b in ?? () > #36 0x004b0d82 in ?? () > #37 0x004b42f6 in ?? () > #38 0x004d1aab in PyEval_CallObjectWithKeywords () > #39 0x004ccdb3 in PyEval_EvalFrameEx (){code} > It looks like the code changes that fixed the previous issue was recently > removed in > [https://github.com/apache/arrow/commit/b766bff34b7d85034d26cebef5b3aeef1eb2fd82#diff-16806bcebc1df2fae432db426905b9f0]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5130) [Python] Segfault when importing TensorFlow after Pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826967#comment-16826967 ] Francois Saint-Jacques commented on ARROW-5130: --- It's a component called crossbow, the gist of what you need is [here|https://github.com/apache/arrow/tree/master/dev/tasks/python-wheels] > [Python] Segfault when importing TensorFlow after Pyarrow > - > > Key: ARROW-5130 > URL: https://issues.apache.org/jira/browse/ARROW-5130 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Travis Addair >Priority: Major > > This issue is similar to https://jira.apache.org/jira/browse/ARROW-2657 which > was fixed in v0.10.0. > When we import TensorFlow after Pyarrow in Linux Debian Jessie, we get a > segfault. To reproduce: > {code:java} > import pyarrow > import tensorflow{code} > Here's the backtrace from gdb: > {code:java} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x in ?? () > (gdb) bt > #0 0x in ?? () > #1 0x7f529ee04410 in pthread_once () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103 > #2 0x7f5229a74efa in void std::call_once(std::once_flag&, > void (&)()) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #3 0x7f5229a74f3e in > tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #4 0x7f522978b561 in tensorflow::port::(anonymous > namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string > const&) () > from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #5 0x7f522978b5b4 in _GLOBAL__sub_I_cpu_feature_guard.cc () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #6 0x7f529f224bea in call_init (l=, argc=argc@entry=9, > argv=argv@entry=0x7ffc6d8c1488, env=env@entry=0x294c0c0) at dl-init.c:78 > #7 0x7f529f224cd3 in call_init (env=0x294c0c0, argv=0x7ffc6d8c1488, > argc=9, l=) at dl-init.c:36 > #8 _dl_init (main_map=main_map@entry=0x2e4aff0, argc=9, argv=0x7ffc6d8c1488, > env=0x294c0c0) at dl-init.c:126 > #9 0x7f529f228e38 in dl_open_worker (a=a@entry=0x7ffc6d8bebb8) at > dl-open.c:577 > #10 0x7f529f224aa4 in _dl_catch_error > (objname=objname@entry=0x7ffc6d8beba8, > errstring=errstring@entry=0x7ffc6d8bebb0, > mallocedp=mallocedp@entry=0x7ffc6d8beba7, > operate=operate@entry=0x7f529f228b60 , > args=args@entry=0x7ffc6d8bebb8) at dl-error.c:187 > #11 0x7f529f22862b in _dl_open (file=0x7f5248178b54 > "/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so", > mode=-2147483646, caller_dlopen=, > nsid=-2, argc=9, argv=0x7ffc6d8c1488, env=0x294c0c0) at dl-open.c:661 > #12 0x7f529ebf402b in dlopen_doit (a=a@entry=0x7ffc6d8bedd0) at > dlopen.c:66 > #13 0x7f529f224aa4 in _dl_catch_error (objname=0x2950fc0, > errstring=0x2950fc8, mallocedp=0x2950fb8, operate=0x7f529ebf3fd0 > , args=0x7ffc6d8bedd0) at dl-error.c:187 > #14 0x7f529ebf45dd in _dlerror_run (operate=operate@entry=0x7f529ebf3fd0 > , args=args@entry=0x7ffc6d8bedd0) at dlerror.c:163 > #15 0x7f529ebf40c1 in __dlopen (file=, mode= out>) at dlopen.c:87 > #16 0x00540859 in _PyImport_GetDynLoadFunc () > #17 0x0054024c in _PyImport_LoadDynamicModule () > #18 0x005f2bcb in ?? () > #19 0x004ca235 in PyEval_EvalFrameEx () > #20 0x004ca9c2 in PyEval_EvalFrameEx () > #21 0x004c8c39 in PyEval_EvalCodeEx () > #22 0x004c84e6 in PyEval_EvalCode () > #23 0x004c6e5c in PyImport_ExecCodeModuleEx () > #24 0x004c3272 in ?? () > #25 0x004b19e2 in ?? () > #26 0x004b13d7 in ?? () > #27 0x004b42f6 in ?? () > #28 0x004d1aab in PyEval_CallObjectWithKeywords () > #29 0x004ccdb3 in PyEval_EvalFrameEx () > #30 0x004c8c39 in PyEval_EvalCodeEx () > #31 0x004c84e6 in PyEval_EvalCode () > #32 0x004c6e5c in PyImport_ExecCodeModuleEx () > #33 0x004c3272 in ?? () > #34 0x004b1d3f in ?? () > #35 0x004b6b2b in ?? () > #36 0x004b0d82 in ?? () > #37 0x004b42f6 in ?? () > #38 0x004d1aab in PyEval_CallObjectWithKeywords () > #39 0x004ccdb3 in PyEval_EvalFrameEx (){code} > It looks like the code changes that fixed the previous issue was recently > removed in > [https://github.com/apache/arrow/commit/b766bff34b7d85034d26cebef5b3aeef1eb2fd82#diff-16806bcebc1df2fae432db426905b9f0]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5208) [Python] Inconsistent resulting type during casting in pa.array() when mask is present
[ https://issues.apache.org/jira/browse/ARROW-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826904#comment-16826904 ] Joris Van den Bossche commented on ARROW-5208: -- To get started, I think the developer docs are the place to look. Specifically the python docs have a good section on how to setup and build arrow and pyarrow: https://arrow.apache.org/docs/developers/python.html#building-on-linux-and-macos > [Python] Inconsistent resulting type during casting in pa.array() when mask > is present > -- > > Key: ARROW-5208 > URL: https://issues.apache.org/jira/browse/ARROW-5208 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 >Reporter: Artem KOZHEVNIKOV >Priority: Major > Fix For: 0.14.0 > > > I would expect Int64Array type in all cases below : > {code:java} > >>> pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) > >>> > >>> [ 4, null, > >>> 4, null ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5208) [Python] Inconsistent resulting type during casting in pa.array() when mask is present
[ https://issues.apache.org/jira/browse/ARROW-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826896#comment-16826896 ] Artem KOZHEVNIKOV commented on ARROW-5208: -- yes, absolutely, it would be nice to get involved! Any doc that can be useful to start with ? CI best practices ? > [Python] Inconsistent resulting type during casting in pa.array() when mask > is present > -- > > Key: ARROW-5208 > URL: https://issues.apache.org/jira/browse/ARROW-5208 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 >Reporter: Artem KOZHEVNIKOV >Priority: Major > Fix For: 0.14.0 > > > I would expect Int64Array type in all cases below : > {code:java} > >>> pa.array([4, None, 4, None], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 'rer'], mask=np.array([False, True, False, True])) > >>> > >>> > [4, null, 4, null ] > >>> pa.array([4, None, 4, 3.], mask=np.array([False, True, False, True])) > >>> > >>> [ 4, null, > >>> 4, null ]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5117) [Go] Panic when appending zero slices after initializing a builder
[ https://issues.apache.org/jira/browse/ARROW-5117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastien Binet resolved ARROW-5117. Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4131 [https://github.com/apache/arrow/pull/4131] > [Go] Panic when appending zero slices after initializing a builder > -- > > Key: ARROW-5117 > URL: https://issues.apache.org/jira/browse/ARROW-5117 > Project: Apache Arrow > Issue Type: Bug > Components: Go >Reporter: Alfonso Subiotto >Assignee: Sebastien Binet >Priority: Critical > Labels: easyfix, newbie, pull-request-available > Fix For: 0.14.0 > > Original Estimate: 1h > Time Spent: 2h 40m > Remaining Estimate: 0h > > {code:java} > array.NewInt8Builder(memory.DefaultAllocator).AppendValues([]int8{}, > []bool{}){code} > results in a panic > {code:java} > === RUN TestArrowPanic > --- FAIL: TestArrowPanic (0.00s) > panic: runtime error: invalid memory address or nil pointer dereference > [recovered] > panic: runtime error: invalid memory address or nil pointer dereference > [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 > pc=0x414f6fd]goroutine 5 [running]: > testing.tRunner.func1(0xc000492a00) > /usr/local/Cellar/go/1.11.5/libexec/src/testing/testing.go:792 +0x387 > panic(0x4cd1fe0, 0x5bb3fb0) > /usr/local/Cellar/go/1.11.5/libexec/src/runtime/panic.go:513 +0x1b9 > github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/memory.(*Buffer).Bytes(...) > > /Users/asubiotto/go/src/github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/memory/buffer.go:67 > github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/array.(*builder).unsafeSetValid(0xc000382a80, > 0x0) > > /Users/asubiotto/go/src/github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/array/builder.go:184 > +0x6d > github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/array.(*builder).unsafeAppendBoolsToBitmap(0xc000382a80, > 0xc00040df88, 0x0, 0x0, 0x0) > > /Users/asubiotto/go/src/github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/array/builder.go:146 > +0x17a > github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/array.(*Int8Builder).AppendValues(0xc000382a80, > 0xc00040df88, 0x0, 0x0, 0xc00040df88, 0x0, 0x0) > > /Users/asubiotto/go/src/github.com/cockroachdb/cockroach/vendor/github.com/apache/arrow/go/arrow/array/numericbuilder.gen.go:1168 > +0xcb > github.com/cockroachdb/cockroach/pkg/util/arrow_test.TestArrowPanic(0xc000492a00) > > /Users/asubiotto/go/src/github.com/cockroachdb/cockroach/pkg/util/arrow/record_batch_test.go:273 > +0x9a > testing.tRunner(0xc000492a00, 0x4ec5370) > /usr/local/Cellar/go/1.11.5/libexec/src/testing/testing.go:827 +0xbf > created by testing.(*T).Run > /usr/local/Cellar/go/1.11.5/libexec/src/testing/testing.go:878 +0x35cProcess > finished with exit code 1{code} > due to the underlying null bitmap never being initialized. I believe the > expectation is for `Resize` to initialize this bitmap. This never happens > because a length of 0 (elements in this block) fails this check: > {code:java} > func (b *builder) reserve(elements int, resize func(int)) { > if b.length+elements > b.capacity { > newCap := bitutil.NextPowerOf2(b.length + elements) > resize(newCap) > } > }{code} > As far as I can tell the arguments to AppendValues are valid. I'd be happy to > submit a patch but I can see several ways of fixing this so would prefer > someone familiar with the code to take a look and define expectations in this > case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-5221) Improvement the performance of class SegmentsUtil
[ https://issues.apache.org/jira/browse/ARROW-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liya Fan closed ARROW-5221. --- Resolution: Invalid > Improvement the performance of class SegmentsUtil > - > > Key: ARROW-5221 > URL: https://issues.apache.org/jira/browse/ARROW-5221 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Minor > > Improve the performance of class SegmentsUtil from two points: > # In method allocateReuseBytes, the generated byte array should be cached > for reuse, if the size does not exceed MAX_BYTES_LENGTH. However, the array > is not cached if bytes.length < length, and this will lead to performance > overhead: > > if (bytes == null) { > if (length <= MAX_BYTES_LENGTH) { > bytes = new byte[MAX_BYTES_LENGTH]; > BYTES_LOCAL.set(bytes); > } else { > bytes = new byte[length]; > } > } else if (bytes.length < length) { > bytes = new byte[length]; > } > > 2. To evaluate the offset, an integer is bitand with a mask to clear to low > bits, and then shift right. The bitand is useless: > > ((index & BIT_BYTE_POSITION_MASK) >>> 3) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5221) Improvement the performance of class SegmentsUtil
Liya Fan created ARROW-5221: --- Summary: Improvement the performance of class SegmentsUtil Key: ARROW-5221 URL: https://issues.apache.org/jira/browse/ARROW-5221 Project: Apache Arrow Issue Type: Improvement Reporter: Liya Fan Assignee: Liya Fan Improve the performance of class SegmentsUtil from two points: # In method allocateReuseBytes, the generated byte array should be cached for reuse, if the size does not exceed MAX_BYTES_LENGTH. However, the array is not cached if bytes.length < length, and this will lead to performance overhead: if (bytes == null) { if (length <= MAX_BYTES_LENGTH) { bytes = new byte[MAX_BYTES_LENGTH]; BYTES_LOCAL.set(bytes); } else { bytes = new byte[length]; } } else if (bytes.length < length) { bytes = new byte[length]; } 2. To evaluate the offset, an integer is bitand with a mask to clear to low bits, and then shift right. The bitand is useless: ((index & BIT_BYTE_POSITION_MASK) >>> 3) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5200) [Java] Provide light-weight arrow APIs
[ https://issues.apache.org/jira/browse/ARROW-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5200: -- Labels: pull-request-available (was: ) > [Java] Provide light-weight arrow APIs > -- > > Key: ARROW-5200 > URL: https://issues.apache.org/jira/browse/ARROW-5200 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Attachments: image-2019-04-23-15-19-34-187.png > > > We are trying to incorporate Apache Arrow to Apache Flink runtime. We find > Arrow an amazing library, which greatly simplifies the support of columnar > data format. > However, for many scenarios, we find the performance unacceptable. Our > investigation shows the reason is that, there are too many redundant checks > and computations in Arrow API. > For example, the following figures shows that in a single call to > Float8Vector.get(int) method (this is one of the most frequently used APIs in > Flink computation), there are 20+ method invocations. > !image-2019-04-23-15-19-34-187.png! > > There are many other APIs with similar problems. We believe that these checks > will make sure of the integrity of the program. However, it also impacts > performance severely. For our evaluation, the performance may degrade by two > or three orders of magnitude slower, compared to access data on heap memory. > We think at least for some scenarios, we can give the responsibility of > integrity check to application owners. If they can be sure all the checks > have been passed, we can provide some light-weight APIs and the inherent high > performance, to them. > In the light-weight APIs, we only provide minimum checks, or avoid checks at > all. The application owner can still develop and debug their code using the > original heavy-weight APIs. Once all bugs have been fixed, they can switch to > light-weight APIs in their products and enjoy the consequent high performance. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3861) [Python] ParquetDataset().read columns argument always returns partition column
[ https://issues.apache.org/jira/browse/ARROW-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826772#comment-16826772 ] Joris Van den Bossche commented on ARROW-3861: -- [~cthi] note that the way you create and pass the schema (with "new" columns and the index column specified) now raises an error. I opened ARROW-5220 for that. What was your intent to add "new_column" to the schema? That it would be created in the actual table? > [Python] ParquetDataset().read columns argument always returns partition > column > --- > > Key: ARROW-3861 > URL: https://issues.apache.org/jira/browse/ARROW-3861 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Christian Thiel >Priority: Major > Labels: parquet, python > Fix For: 0.14.0 > > > I just noticed that no matter which columns are specified on load of a > dataset, the partition column is always returned. This might lead to strange > behaviour, as the resulting dataframe has more than the expected columns: > {code} > import dask as da > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > import os > import numpy as np > import shutil > PATH_PYARROW_MANUAL = '/tmp/pyarrow_manual.pa/' > if os.path.exists(PATH_PYARROW_MANUAL): > shutil.rmtree(PATH_PYARROW_MANUAL) > os.mkdir(PATH_PYARROW_MANUAL) > arrays = np.array([np.array([0, 1, 2]), np.array([3, 4]), np.nan, np.nan]) > strings = np.array([np.nan, np.nan, 'a', 'b']) > df = pd.DataFrame([0, 0, 1, 1], columns=['partition_column']) > df.index.name='DPRD_ID' > df['arrays'] = pd.Series(arrays) > df['strings'] = pd.Series(strings) > my_schema = pa.schema([('DPRD_ID', pa.int64()), >('partition_column', pa.int32()), >('arrays', pa.list_(pa.int32())), >('strings', pa.string()), >('new_column', pa.string())]) > table = pa.Table.from_pandas(df, schema=my_schema) > pq.write_to_dataset(table, root_path=PATH_PYARROW_MANUAL, > partition_cols=['partition_column']) > df_pq = pq.ParquetDataset(PATH_PYARROW_MANUAL).read(columns=['DPRD_ID', > 'strings']).to_pandas() > # pd.read_parquet(PATH_PYARROW_MANUAL, columns=['DPRD_ID', 'strings'], > engine='pyarrow') > df_pq > {code} > df_pq has column `partition_column` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5220) [Python] index / unknown columns in specified schema in Table.from_pandas
[ https://issues.apache.org/jira/browse/ARROW-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5220: - Description: The {{Table.from_pandas}} method allows to specify a schema ("This can be used to indicate the type of columns if we cannot infer it automatically."). But, if you also want to specify the type of the index, you get an error: {code:python} df = pd.DataFrame({'a': [1, 2, 3], 'b': [0.1, 0.2, 0.3]}) df.index = pd.Index(['a', 'b', 'c'], name='index') my_schema = pa.schema([('index', pa.string()), ('a', pa.int64()), ('b', pa.float64()), ]) table = pa.Table.from_pandas(df, schema=my_schema) {code} gives {{KeyError: 'index'}} (because it tries to look up the "column names" from the schema in the dataframe, and thus does not find column 'index'). This also has the consequence that re-using the schema does not work: {{table1 = pa.Table.from_pandas(df1); table2 = pa.Table.from_pandas(df2, schema=table1.schema)}} Extra note: also unknown columns in general give this error (column specified in the schema that are not in the dataframe). At least in pyarrow 0.11, this did not give an error (eg noticed this from the code in example in ARROW-3861). So before, unknown columns in the specified schema were ignored, while now they raise an error. Was this a conscious change? So before also specifying the index in the schema "worked" in the sense that it didn't raise an error, but it was also ignored, so didn't actually do what you would expect) Questions: - I think that we should support specifying the index in the passed {{schema}} ? So that the example above works (although this might be complicated with RangeIndex that is not serialized any more) - But what to do in general with additional columns in the schema that are not in the DataFrame? Are we fine with keep raising an error as it is now (the error message could be improved then)? Or do we again want to ignore them? (or, it could actually also add them as all nulls to the table) was: The {{Table.from_pandas}} method allows to specify a schema ("This can be used to indicate the type of columns if we cannot infer it automatically."). But, if you also want to specify the type of the index, you get an error: {code:python} df = pd.DataFrame(\{'a': [1, 2, 3], 'b': [0.1, 0.2, 0.3]}) df.index = pd.Index(['a', 'b', 'c'], name='index') my_schema = pa.schema([('index', pa.string()), ('a', pa.int64()), ('b', pa.float64()), ]) table = pa.Table.from_pandas(df, schema=my_schema) {code} gives {{KeyError: 'index'}} (because it tries to look up the "column names" from the schema in the dataframe, and thus does not find column 'index'). This also has the consequence that re-using the schema does not work: {{table1 = pa.Table.from_pandas(df1); table2 = pa.Table.from_pandas(df2, schema=table1.schema)}} Extra note: also unknown columns in general give this error (column specified in the schema that are not in the dataframe). At least in pyarrow 0.11, this did not give an error (eg noticed this from the code in example in ARROW-3861). So before, unknown columns in the specified schema were ignored, while now they raise an error. Was this a conscious change? So before also specifying the index in the schema "worked" in the sense that it didn't raise an error, but it was also ignored, so didn't actually do what you would expect) Questions: - I think that we should support specifying the index in the passed {{schema}} ? So that the example above works (although this might be complicated with RangeIndex that is not serialized any more) - But what to do in general with additional columns in the schema that are not in the DataFrame? Are we fine with keep raising an error as it is now (the error message could be improved then)? Or do we again want to ignore them? (or, it could actually also add them as all nulls to the table) > [Python] index / unknown columns in specified schema in Table.from_pandas > - > > Key: ARROW-5220 > URL: https://issues.apache.org/jira/browse/ARROW-5220 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Joris Van den Bossche >Priority: Minor > > The {{Table.from_pandas}} method allows to specify a schema ("This can be > used to indicate the type of columns if we cannot infer it automatically."). > But, if you also want to specify the type of the index, you get an error: > {code:python} > df = pd.DataFrame({'a': [1, 2, 3], 'b': [0.1, 0.2, 0.3]}) > df.index = pd.Index(['a', 'b', 'c'], name='index') > my_schema = pa.schema([('index', pa.string()), >
[jira] [Created] (ARROW-5220) [Python] index / unknown columns in specified schema in Table.from_pandas
Joris Van den Bossche created ARROW-5220: Summary: [Python] index / unknown columns in specified schema in Table.from_pandas Key: ARROW-5220 URL: https://issues.apache.org/jira/browse/ARROW-5220 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Joris Van den Bossche The {{Table.from_pandas}} method allows to specify a schema ("This can be used to indicate the type of columns if we cannot infer it automatically."). But, if you also want to specify the type of the index, you get an error: {code:python} df = pd.DataFrame(\{'a': [1, 2, 3], 'b': [0.1, 0.2, 0.3]}) df.index = pd.Index(['a', 'b', 'c'], name='index') my_schema = pa.schema([('index', pa.string()), ('a', pa.int64()), ('b', pa.float64()), ]) table = pa.Table.from_pandas(df, schema=my_schema) {code} gives {{KeyError: 'index'}} (because it tries to look up the "column names" from the schema in the dataframe, and thus does not find column 'index'). This also has the consequence that re-using the schema does not work: {{table1 = pa.Table.from_pandas(df1); table2 = pa.Table.from_pandas(df2, schema=table1.schema)}} Extra note: also unknown columns in general give this error (column specified in the schema that are not in the dataframe). At least in pyarrow 0.11, this did not give an error (eg noticed this from the code in example in ARROW-3861). So before, unknown columns in the specified schema were ignored, while now they raise an error. Was this a conscious change? So before also specifying the index in the schema "worked" in the sense that it didn't raise an error, but it was also ignored, so didn't actually do what you would expect) Questions: - I think that we should support specifying the index in the passed {{schema}} ? So that the example above works (although this might be complicated with RangeIndex that is not serialized any more) - But what to do in general with additional columns in the schema that are not in the DataFrame? Are we fine with keep raising an error as it is now (the error message could be improved then)? Or do we again want to ignore them? (or, it could actually also add them as all nulls to the table) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3861) [Python] ParquetDataset().read columns argument always returns partition column
[ https://issues.apache.org/jira/browse/ARROW-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3861: - Labels: parquet python (was: parquet pyarrow python) > [Python] ParquetDataset().read columns argument always returns partition > column > --- > > Key: ARROW-3861 > URL: https://issues.apache.org/jira/browse/ARROW-3861 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Christian Thiel >Priority: Major > Labels: parquet, python > Fix For: 0.14.0 > > > I just noticed that no matter which columns are specified on load of a > dataset, the partition column is always returned. This might lead to strange > behaviour, as the resulting dataframe has more than the expected columns: > {code} > import dask as da > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > import os > import numpy as np > import shutil > PATH_PYARROW_MANUAL = '/tmp/pyarrow_manual.pa/' > if os.path.exists(PATH_PYARROW_MANUAL): > shutil.rmtree(PATH_PYARROW_MANUAL) > os.mkdir(PATH_PYARROW_MANUAL) > arrays = np.array([np.array([0, 1, 2]), np.array([3, 4]), np.nan, np.nan]) > strings = np.array([np.nan, np.nan, 'a', 'b']) > df = pd.DataFrame([0, 0, 1, 1], columns=['partition_column']) > df.index.name='DPRD_ID' > df['arrays'] = pd.Series(arrays) > df['strings'] = pd.Series(strings) > my_schema = pa.schema([('DPRD_ID', pa.int64()), >('partition_column', pa.int32()), >('arrays', pa.list_(pa.int32())), >('strings', pa.string()), >('new_column', pa.string())]) > table = pa.Table.from_pandas(df, schema=my_schema) > pq.write_to_dataset(table, root_path=PATH_PYARROW_MANUAL, > partition_cols=['partition_column']) > df_pq = pq.ParquetDataset(PATH_PYARROW_MANUAL).read(columns=['DPRD_ID', > 'strings']).to_pandas() > # pd.read_parquet(PATH_PYARROW_MANUAL, columns=['DPRD_ID', 'strings'], > engine='pyarrow') > df_pq > {code} > df_pq has column `partition_column` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3861) [Python] ParquetDataset().read columns argument always returns partition column
[ https://issues.apache.org/jira/browse/ARROW-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3861: - Labels: parquet pyarrow python (was: pyarrow python) > [Python] ParquetDataset().read columns argument always returns partition > column > --- > > Key: ARROW-3861 > URL: https://issues.apache.org/jira/browse/ARROW-3861 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Christian Thiel >Priority: Major > Labels: parquet, pyarrow, python > Fix For: 0.14.0 > > > I just noticed that no matter which columns are specified on load of a > dataset, the partition column is always returned. This might lead to strange > behaviour, as the resulting dataframe has more than the expected columns: > {code} > import dask as da > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > import os > import numpy as np > import shutil > PATH_PYARROW_MANUAL = '/tmp/pyarrow_manual.pa/' > if os.path.exists(PATH_PYARROW_MANUAL): > shutil.rmtree(PATH_PYARROW_MANUAL) > os.mkdir(PATH_PYARROW_MANUAL) > arrays = np.array([np.array([0, 1, 2]), np.array([3, 4]), np.nan, np.nan]) > strings = np.array([np.nan, np.nan, 'a', 'b']) > df = pd.DataFrame([0, 0, 1, 1], columns=['partition_column']) > df.index.name='DPRD_ID' > df['arrays'] = pd.Series(arrays) > df['strings'] = pd.Series(strings) > my_schema = pa.schema([('DPRD_ID', pa.int64()), >('partition_column', pa.int32()), >('arrays', pa.list_(pa.int32())), >('strings', pa.string()), >('new_column', pa.string())]) > table = pa.Table.from_pandas(df, schema=my_schema) > pq.write_to_dataset(table, root_path=PATH_PYARROW_MANUAL, > partition_cols=['partition_column']) > df_pq = pq.ParquetDataset(PATH_PYARROW_MANUAL).read(columns=['DPRD_ID', > 'strings']).to_pandas() > # pd.read_parquet(PATH_PYARROW_MANUAL, columns=['DPRD_ID', 'strings'], > engine='pyarrow') > df_pq > {code} > df_pq has column `partition_column` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5130) [Python] Segfault when importing TensorFlow after Pyarrow
[ https://issues.apache.org/jira/browse/ARROW-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826698#comment-16826698 ] Alexander Sergeev commented on ARROW-5130: -- [~fsaintjacques] is the build process for wheels that end up in PyPI documented somewhere, so I could reproduce the issue locally with containers & spread the [https://github.com/apache/arrow/pull/2096] around? > [Python] Segfault when importing TensorFlow after Pyarrow > - > > Key: ARROW-5130 > URL: https://issues.apache.org/jira/browse/ARROW-5130 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.13.0 >Reporter: Travis Addair >Priority: Major > > This issue is similar to https://jira.apache.org/jira/browse/ARROW-2657 which > was fixed in v0.10.0. > When we import TensorFlow after Pyarrow in Linux Debian Jessie, we get a > segfault. To reproduce: > {code:java} > import pyarrow > import tensorflow{code} > Here's the backtrace from gdb: > {code:java} > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x in ?? () > (gdb) bt > #0 0x in ?? () > #1 0x7f529ee04410 in pthread_once () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103 > #2 0x7f5229a74efa in void std::call_once(std::once_flag&, > void (&)()) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #3 0x7f5229a74f3e in > tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #4 0x7f522978b561 in tensorflow::port::(anonymous > namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string > const&) () > from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #5 0x7f522978b5b4 in _GLOBAL__sub_I_cpu_feature_guard.cc () from > /usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so > #6 0x7f529f224bea in call_init (l=, argc=argc@entry=9, > argv=argv@entry=0x7ffc6d8c1488, env=env@entry=0x294c0c0) at dl-init.c:78 > #7 0x7f529f224cd3 in call_init (env=0x294c0c0, argv=0x7ffc6d8c1488, > argc=9, l=) at dl-init.c:36 > #8 _dl_init (main_map=main_map@entry=0x2e4aff0, argc=9, argv=0x7ffc6d8c1488, > env=0x294c0c0) at dl-init.c:126 > #9 0x7f529f228e38 in dl_open_worker (a=a@entry=0x7ffc6d8bebb8) at > dl-open.c:577 > #10 0x7f529f224aa4 in _dl_catch_error > (objname=objname@entry=0x7ffc6d8beba8, > errstring=errstring@entry=0x7ffc6d8bebb0, > mallocedp=mallocedp@entry=0x7ffc6d8beba7, > operate=operate@entry=0x7f529f228b60 , > args=args@entry=0x7ffc6d8bebb8) at dl-error.c:187 > #11 0x7f529f22862b in _dl_open (file=0x7f5248178b54 > "/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so", > mode=-2147483646, caller_dlopen=, > nsid=-2, argc=9, argv=0x7ffc6d8c1488, env=0x294c0c0) at dl-open.c:661 > #12 0x7f529ebf402b in dlopen_doit (a=a@entry=0x7ffc6d8bedd0) at > dlopen.c:66 > #13 0x7f529f224aa4 in _dl_catch_error (objname=0x2950fc0, > errstring=0x2950fc8, mallocedp=0x2950fb8, operate=0x7f529ebf3fd0 > , args=0x7ffc6d8bedd0) at dl-error.c:187 > #14 0x7f529ebf45dd in _dlerror_run (operate=operate@entry=0x7f529ebf3fd0 > , args=args@entry=0x7ffc6d8bedd0) at dlerror.c:163 > #15 0x7f529ebf40c1 in __dlopen (file=, mode= out>) at dlopen.c:87 > #16 0x00540859 in _PyImport_GetDynLoadFunc () > #17 0x0054024c in _PyImport_LoadDynamicModule () > #18 0x005f2bcb in ?? () > #19 0x004ca235 in PyEval_EvalFrameEx () > #20 0x004ca9c2 in PyEval_EvalFrameEx () > #21 0x004c8c39 in PyEval_EvalCodeEx () > #22 0x004c84e6 in PyEval_EvalCode () > #23 0x004c6e5c in PyImport_ExecCodeModuleEx () > #24 0x004c3272 in ?? () > #25 0x004b19e2 in ?? () > #26 0x004b13d7 in ?? () > #27 0x004b42f6 in ?? () > #28 0x004d1aab in PyEval_CallObjectWithKeywords () > #29 0x004ccdb3 in PyEval_EvalFrameEx () > #30 0x004c8c39 in PyEval_EvalCodeEx () > #31 0x004c84e6 in PyEval_EvalCode () > #32 0x004c6e5c in PyImport_ExecCodeModuleEx () > #33 0x004c3272 in ?? () > #34 0x004b1d3f in ?? () > #35 0x004b6b2b in ?? () > #36 0x004b0d82 in ?? () > #37 0x004b42f6 in ?? () > #38 0x004d1aab in PyEval_CallObjectWithKeywords () > #39 0x004ccdb3 in PyEval_EvalFrameEx (){code} > It looks like the code changes that fixed the previous issue was recently > removed in > [https://github.com/apache/arrow/commit/b766bff34b7d85034d26cebef5b3aeef1eb2fd82#diff-16806bcebc1df2fae432db426905b9f0]. -- This message was sent by