[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999688#comment-15999688 ] Devang Shah commented on ARROW-955: --- That worked as expected. Thanks a lot for the excellent response, Wes! Truly appreciated. Here's another quick question: the documentation at: https://media.readthedocs.org/pdf/pyarrow/latest/pyarrow.pdf is quite sparse in the read/write section. There's nothing about ParquetFile or ParquetReader, especially, APIs like read_row_group(). Specifically about read_row_group(): are there interfaces to then extract a row at a time from the result of read_row_group()? A row group by default is 128MB, but may have been written out as larger (I think the recommendation is 1GB?) - so read_row_group() would return a row_group which would occupy 128MB or 1GB of RAM (depending on the parquet.block.size when the row group was written out). Am I right? If so, are there interfaces to then read a row at a time from the row group (returned via read_row_group())? > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-962) [Python] Add schema attribute to FileReader
Wes McKinney created ARROW-962: -- Summary: [Python] Add schema attribute to FileReader Key: ARROW-962 URL: https://issues.apache.org/jira/browse/ARROW-962 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.4.0 This will help with API conformity between the Stream and File classes -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999655#comment-15999655 ] Wes McKinney commented on ARROW-955: >From above, the command to build and install is: {code} python setup.py build_ext --build-type=release --with-parquet --with-jemalloc install {code} > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999651#comment-15999651 ] Devang Shah edited comment on ARROW-955 at 5/7/17 2:00 AM: --- How do I install it now, that it's been built --inplace ? Should I just re-run the command as: {code} python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc --install {code} Or is there a different command, which just does install? Like: {code} python setup.py --install {code} was (Author: derringdo): How do I install it now, that it's been built --inplace ? Should I just re-run the command as: {code} c build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc --install {code} Or is there a different command, which just does install? Like: {code} python setup.py --install {code} > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999651#comment-15999651 ] Devang Shah commented on ARROW-955: --- How do I install it now, that it's been built --inplace ? Should I just re-run the command as: {code} c build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc --install {code} Or is there a different command, which just does install? Like: {code} python setup.py --install {code} > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-961) [Python] Rename InMemoryOutputStream to BufferOutputStream
Wes McKinney created ARROW-961: -- Summary: [Python] Rename InMemoryOutputStream to BufferOutputStream Key: ARROW-961 URL: https://issues.apache.org/jira/browse/ARROW-961 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.4.0 Having this name difference does not seem especially helpful. We can maintain the existing name as an alias for the duration of 1 release. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999646#comment-15999646 ] Wes McKinney commented on ARROW-955: Your command is missing {{install}} at the end, which copies the built package into the environment's {{site-packages}} directory. > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999644#comment-15999644 ] Devang Shah commented on ARROW-955: --- Here's the output of the successful "python setup.py ..." call for the pyarrow build (which used to fail until your fix): {code} (pyarrow-dev) derdo@prompt:~/repos/arrow/python$ python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc --inplace running build_ext cmake -DPYTHON_EXECUTABLE=/home/derdo/miniconda2/envs/pyarrow-dev/bin/python -DPYARROW_BUILD_PARQUET=on -DPYARROW_BUILD_JEMALLOC=on -DCMAKE_BUILD_TYPE=release /home/derdo/repos/arrow/python Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...}) -- Build Type: RELEASE INFOCompiler version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04.3' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) INFOCompiler id: GNU Selected compiler gcc 4.8.4 Using static linking for RELEASE builds collect2 version 4.8.4 /usr/bin/ld --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -z relro /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../.. --version -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crtn.o Using ld linker -- Build output directory: /home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/release/ -- Searching for Python libs in /home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config -- Looking for python2.7 -- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so -- Searching for Python libs in /home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config -- Looking for python2.7 -- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so -- Found the Parquet library: /usr/local/lib/libparquet.so -- Found the Parquet Arrow library: /usr/local/lib -- Found the Arrow core library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so -- Found the Arrow Python library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so -- Found the Arrow jemalloc library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so Added shared library dependency arrow: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so Added shared library dependency arrow_python: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so Added shared library dependency parquet_arrow: /usr/local/lib/libparquet_arrow.so Added shared library dependency arrow_jemalloc: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so -- Configuring done -- Generating done -- Build files have been written to: /home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7 make Scanning dependencies of target _parquet_pyx [ 4%] Compiling Cython CXX source for _parquet... [ 4%] Built target _parquet_pyx Scanning dependencies of target _parquet [ 8%] Building CXX object
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999642#comment-15999642 ] Wes McKinney commented on ARROW-955: Can you show the console output of {code} python setup.py \ build_ext --build-type=release --with-parquet --with-jemalloc \ install {code} with that conda environment activated? For your other questions, you're now in the domain of general Python package management and devops, which is not something we can help too much with. I recommend either building conda packages or binary wheels (e.g. using our manylinux1 toolchain), if using our released binary artifacts doesn't work for your use case. > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999635#comment-15999635 ] Devang Shah commented on ARROW-955: --- Also, once I've experimented with a conda package (in this case, the pyarrow package), within a conda environment; how do I install it for the system (i.e. outside the environment)? So that anyone in the system can then use the package (without having to first activate a specific conda environment). > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999631#comment-15999631 ] Devang Shah commented on ARROW-955: --- I followed the source-from-build instructions at: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst So, I did a "source activate pyarrow-dev" before cloning the repos and building parquet-cpp, arrow and pyarrow exactly as instructed (with your fix which was needed to make the pyarrow build / install work). So do these steps not install into the pyarrow-dev environment? If they do, then maybe when I activate this environment from a different terminal or ssh session, I need to set some env-vars to make this package usable/importable from this environment ? > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999624#comment-15999624 ] Wes McKinney commented on ARROW-955: In the {{pyarrow-dev}} case, it looks like the package was not installed in that environment. > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999620#comment-15999620 ] Devang Shah commented on ARROW-955: --- The same thing in the conda env which has the binary download works as expected, and shows that the package is coming from the activated environment: {code} derdo@prompt:~$ source activate wfparq (wfparq) derdo@prompt:~$ python Python 2.7.13 | packaged by conda-forge | (default, May 2 2017, 12:48:11) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow.parquet >>> pyarrow.parquet {code} > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999618#comment-15999618 ] Devang Shah commented on ARROW-955: --- Thanks, but I am running into a problem: when I activate the source-build conda environment from a different terminal, I can't import pyarrow.parquet at all {code} source activate pyarrow-dev (pyarrow-dev) derdo@prompt:~$ python Python 2.7.13 | packaged by conda-forge | (default, May 2 2017, 12:48:11) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow.parquet Traceback (most recent call last): File "", line 1, in ImportError: No module named pyarrow.parquet >>> {code} > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format
[ https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999611#comment-15999611 ] Wes McKinney commented on ARROW-300: I'm sorry for the delay. With the 0.3 Arrow release done, it would be good to make a push on compression and encoding. How about we start a Google Document that supports public comments and you can give edit support to whomever you like? Once we agree on the design, one of us can make a pull request containing the Flatbuffer metadata for the compression / encoding details. Does that sound good? > [Format] Add buffer compression option to IPC file format > - > > Key: ARROW-300 > URL: https://issues.apache.org/jira/browse/ARROW-300 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney > > It may be useful if data is to be sent over the wire to compress the data > buffers themselves as their being written in the file layout. > I would propose that we keep this extremely simple with a global buffer > compression setting in the file Footer. Probably only two compressors worth > supporting out of the box would be zlib (higher compression ratios) and lz4 > (better performance). > What does everyone think? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999609#comment-15999609 ] Wes McKinney commented on ARROW-955: If you're going to work with both development builds and released binary artifacts, it's good practice to work in conda environments, so you would do development in a different environment from the one where you installed the pyarrow package from conda-forge. You can see what is imported in the Python shell {code} In [1]: import pyarrow In [2]: pyarrow Out[2]: {code} > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999607#comment-15999607 ] Devang Shah commented on ARROW-955: --- BTW, on the same machine in a different conda env, I installed version 0.3.0 through conda-forge (the binary download). So, if I am on a different conda env where I did the build from source, invoking python in the new environment, and importing pyarrow.parquet should give me the pyarrow from the "build-from-source" in the new environment - correct? How do I double-check this? That I am getting the right pyarrow when I import it in python in the new conda env? Thanks a lot for your prompt help. > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-957) [Doc] Add HDFS and Windows documents to doxygen output
[ https://issues.apache.org/jira/browse/ARROW-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999605#comment-15999605 ] Wes McKinney commented on ARROW-957: Sounds good to me. > [Doc] Add HDFS and Windows documents to doxygen output > -- > > Key: ARROW-957 > URL: https://issues.apache.org/jira/browse/ARROW-957 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn > > Currently these documents are not rendered on the website. I would move them > to the {{apidoc/}} folder and link to them in the main doxygen page. Probably > this is the point where we also would move {{apidoc/}} back to {{docs/}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999604#comment-15999604 ] Wes McKinney commented on ARROW-955: Those tests are expected failures, so all is good. > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999598#comment-15999598 ] Devang Shah edited comment on ARROW-955 at 5/6/17 9:59 PM: --- Yes! Thanks a million. However, a couple of tests fail: {code} -- Found the Parquet library: /usr/local/lib/libparquet.so -- Found the Parquet Arrow library: /usr/local/lib -- Found the Arrow core library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so -- Found the Arrow Python library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so -- Found the Arrow jemalloc library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so Added shared library dependency arrow: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so Added shared library dependency arrow_python: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so Added shared library dependency parquet_arrow: /usr/local/lib/libparquet_arrow.so Added shared library dependency arrow_jemalloc: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so -- Configuring done -- Generating done -- Build files have been written to: /home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7 make Scanning dependencies of target _parquet_pyx [ 4%] Compiling Cython CXX source for _parquet... [ 4%] Built target _parquet_pyx Scanning dependencies of target _parquet [ 8%] Building CXX object CMakeFiles/_parquet.dir/_parquet.cxx.o [ 12%] Linking CXX shared module release/_parquet.so [ 12%] Built target _parquet Scanning dependencies of target _error_pyx [ 16%] Compiling Cython CXX source for _error... [ 16%] Built target _error_pyx Scanning dependencies of target _error [ 20%] Building CXX object CMakeFiles/_error.dir/_error.cxx.o [ 25%] Linking CXX shared module release/_error.so [ 25%] Built target _error Scanning dependencies of target _jemalloc_pyx [ 29%] Compiling Cython CXX source for _jemalloc... [ 29%] Built target _jemalloc_pyx Scanning dependencies of target _jemalloc [ 33%] Building CXX object CMakeFiles/_jemalloc.dir/_jemalloc.cxx.o [ 37%] Linking CXX shared module release/_jemalloc.so [ 37%] Built target _jemalloc Scanning dependencies of target _table_pyx [ 41%] Compiling Cython CXX source for _table... [ 41%] Built target _table_pyx Scanning dependencies of target _table [ 45%] Building CXX object CMakeFiles/_table.dir/_table.cxx.o [ 50%] Linking CXX shared module release/_table.so [ 50%] Built target _table Scanning dependencies of target _config_pyx [ 54%] Compiling Cython CXX source for _config... [ 54%] Built target _config_pyx Scanning dependencies of target _config [ 58%] Building CXX object CMakeFiles/_config.dir/_config.cxx.o [ 62%] Linking CXX shared module release/_config.so [ 62%] Built target _config Scanning dependencies of target _memory_pyx [ 66%] Compiling Cython CXX source for _memory... [ 66%] Built target _memory_pyx Scanning dependencies of target _memory [ 70%] Building CXX object CMakeFiles/_memory.dir/_memory.cxx.o [ 75%] Linking CXX shared module release/_memory.so [ 75%] Built target _memory Scanning dependencies of target _array_pyx [ 79%] Compiling Cython CXX source for _array... [ 79%] Built target _array_pyx Scanning dependencies of target _array [ 83%] Building CXX object CMakeFiles/_array.dir/_array.cxx.o [ 87%] Linking CXX shared module release/_array.so [ 87%] Built target _array Scanning dependencies of target _io_pyx [ 91%] Compiling Cython CXX source for _io... [ 91%] Built target _io_pyx Scanning dependencies of target _io [ 95%] Building CXX object CMakeFiles/_io.dir/_io.cxx.o [100%] Linking CXX shared module release/_io.so [100%] Built target _io ('Moving built C-extension', 'release/_array.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_array.so') ('Moving built C-extension', 'release/_config.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_config.so') ('Moving built C-extension', 'release/_error.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_error.so') ('Moving built C-extension', 'release/_io.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_io.so') ('Moving built C-extension', 'release/_jemalloc.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_jemalloc.so') ('Moving built C-extension', 'release/_memory.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_memory.so') ('Moving built C-extension', 'release/_parquet.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_parquet.so') ('Moving built C-extension', 'release/_table.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_table.so') (pyarrow-dev) derdo@prompt:~/repos/arrow/python$ py.test pyarrow === test session starts === platform linux2 -- Python 2.7.13, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 rootdir: /home/derdo/repos/arrow/python, inifile: collected 210 items
[jira] [Updated] (ARROW-909) libjemalloc.so.2: cannot open shared object file:
[ https://issues.apache.org/jira/browse/ARROW-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-909: --- Fix Version/s: 0.4.0 > libjemalloc.so.2: cannot open shared object file: > -- > > Key: ARROW-909 > URL: https://issues.apache.org/jira/browse/ARROW-909 > Project: Apache Arrow > Issue Type: Bug > Environment: linux centos >Reporter: Abdul Rahman >Assignee: Uwe L. Korn > Labels: pyarrow > Fix For: 0.4.0 > > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/home/default/src/venv/lib/python2.7/site-packages/pyarrow-0.2.1.dev244+g14bec24-py2.7-linux-x86_64.egg/pyarrow/__init__.py", > line 28, in > import pyarrow._config > ImportError: libjemalloc.so.2: cannot open shared object file: No such file > or directory > $LD_LIBRARY_PATH has libarrow_jemalloc.a along with other libraries including > libarrow.so, libparquet.so, libparquet_arrow.so. Pyarrow was built using > with-jemalloc and parquet-cpp was cmake-d with > -DPARQUET_ARROW=ON > Also, noticed that arrow/python documentation has been cleaned up with the > installation instructions having the coda approach only .Is this the only > supported way going forward ? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-909) libjemalloc.so.2: cannot open shared object file:
[ https://issues.apache.org/jira/browse/ARROW-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-909. Resolution: Fixed Issue resolved by pull request 651 [https://github.com/apache/arrow/pull/651] > libjemalloc.so.2: cannot open shared object file: > -- > > Key: ARROW-909 > URL: https://issues.apache.org/jira/browse/ARROW-909 > Project: Apache Arrow > Issue Type: Bug > Environment: linux centos >Reporter: Abdul Rahman >Assignee: Uwe L. Korn > Labels: pyarrow > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/home/default/src/venv/lib/python2.7/site-packages/pyarrow-0.2.1.dev244+g14bec24-py2.7-linux-x86_64.egg/pyarrow/__init__.py", > line 28, in > import pyarrow._config > ImportError: libjemalloc.so.2: cannot open shared object file: No such file > or directory > $LD_LIBRARY_PATH has libarrow_jemalloc.a along with other libraries including > libarrow.so, libparquet.so, libparquet_arrow.so. Pyarrow was built using > with-jemalloc and parquet-cpp was cmake-d with > -DPARQUET_ARROW=ON > Also, noticed that arrow/python documentation has been cleaned up with the > installation instructions having the coda approach only .Is this the only > supported way going forward ? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-947) [Python] Improve execution time of manylinux1 build
[ https://issues.apache.org/jira/browse/ARROW-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-947: --- Fix Version/s: 0.4.0 > [Python] Improve execution time of manylinux1 build > --- > > Key: ARROW-947 > URL: https://issues.apache.org/jira/browse/ARROW-947 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.3.0 >Reporter: Wes McKinney >Assignee: Uwe L. Korn > Fix For: 0.4.0 > > > Perhaps we could have the same testing benefits by limiting the matrix of > builds? Pulling the Docker image takes about 90 seconds, but the build itself > takes 25 minutes or more. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-856) CmakeError by Unknown compiler.
[ https://issues.apache.org/jira/browse/ARROW-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-856: --- Fix Version/s: 0.4.0 > CmakeError by Unknown compiler. > > > Key: ARROW-856 > URL: https://issues.apache.org/jira/browse/ARROW-856 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: YJ >Assignee: Uwe L. Korn > Fix For: 0.4.0 > > > From :https://github.com/ray-project/ray/issues/468 > [root@SZV1000268092 python]# LANG=C gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper > Target: x86_64-redhat-linux > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man > --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla > --enable-bootstrap --enable-shared --enable-threads=posix > --enable-checking=release --with-system-zlib --enable-__cxa_atexit > --disable-libunwind-exceptions --enable-gnu-unique-object > --enable-linker-build-id --with-linker-hash-style=gnu > --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin > --enable-initfini-array --disable-libgcj > --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install > > --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install > --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 > --build=x86_64-redhat-linux > Thread model: posix > gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) > Result: > INFO GNU > CMake Error at cmake_modules/CompilerInfo.cmake:62 (message): > Unknown compiler. Version info is just the above. > Error > /usr/bin/c++-g -O3 -march=native -mtune=native -DCXX_SUPPORTS_ALTIVEC > -maltivec -o CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o -c > /home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp/src.cxx > c++: error: unrecognized command line option '-maltivec' > make[1]: *** [CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o] Error 1 > make[1]: Leaving directory > `/home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp' > make: *** [cmTryCompileExec1115247767/fast] Error 2 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0
[ https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-899. Resolution: Fixed Issue resolved by pull request 652 [https://github.com/apache/arrow/pull/652] > [Docs] Add CHANGELOG for 0.3.0 > -- > > Key: ARROW-899 > URL: https://issues.apache.org/jira/browse/ARROW-899 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.4.0 > > > See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-856) CmakeError by Unknown compiler.
[ https://issues.apache.org/jira/browse/ARROW-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-856. Resolution: Fixed Issue resolved by pull request 650 [https://github.com/apache/arrow/pull/650] > CmakeError by Unknown compiler. > > > Key: ARROW-856 > URL: https://issues.apache.org/jira/browse/ARROW-856 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: YJ >Assignee: Uwe L. Korn > > From :https://github.com/ray-project/ray/issues/468 > [root@SZV1000268092 python]# LANG=C gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper > Target: x86_64-redhat-linux > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man > --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla > --enable-bootstrap --enable-shared --enable-threads=posix > --enable-checking=release --with-system-zlib --enable-__cxa_atexit > --disable-libunwind-exceptions --enable-gnu-unique-object > --enable-linker-build-id --with-linker-hash-style=gnu > --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin > --enable-initfini-array --disable-libgcj > --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install > > --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install > --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 > --build=x86_64-redhat-linux > Thread model: posix > gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) > Result: > INFO GNU > CMake Error at cmake_modules/CompilerInfo.cmake:62 (message): > Unknown compiler. Version info is just the above. > Error > /usr/bin/c++-g -O3 -march=native -mtune=native -DCXX_SUPPORTS_ALTIVEC > -maltivec -o CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o -c > /home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp/src.cxx > c++: error: unrecognized command line option '-maltivec' > make[1]: *** [CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o] Error 1 > make[1]: Leaving directory > `/home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp' > make: *** [cmTryCompileExec1115247767/fast] Error 2 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999598#comment-15999598 ] Devang Shah commented on ARROW-955: --- Yes! Thanks a million. However, a couple of tests fail: (code) -- Found the Parquet library: /usr/local/lib/libparquet.so -- Found the Parquet Arrow library: /usr/local/lib -- Found the Arrow core library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so -- Found the Arrow Python library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so -- Found the Arrow jemalloc library: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so Added shared library dependency arrow: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so Added shared library dependency arrow_python: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so Added shared library dependency parquet_arrow: /usr/local/lib/libparquet_arrow.so Added shared library dependency arrow_jemalloc: /home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so -- Configuring done -- Generating done -- Build files have been written to: /home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7 make Scanning dependencies of target _parquet_pyx [ 4%] Compiling Cython CXX source for _parquet... [ 4%] Built target _parquet_pyx Scanning dependencies of target _parquet [ 8%] Building CXX object CMakeFiles/_parquet.dir/_parquet.cxx.o [ 12%] Linking CXX shared module release/_parquet.so [ 12%] Built target _parquet Scanning dependencies of target _error_pyx [ 16%] Compiling Cython CXX source for _error... [ 16%] Built target _error_pyx Scanning dependencies of target _error [ 20%] Building CXX object CMakeFiles/_error.dir/_error.cxx.o [ 25%] Linking CXX shared module release/_error.so [ 25%] Built target _error Scanning dependencies of target _jemalloc_pyx [ 29%] Compiling Cython CXX source for _jemalloc... [ 29%] Built target _jemalloc_pyx Scanning dependencies of target _jemalloc [ 33%] Building CXX object CMakeFiles/_jemalloc.dir/_jemalloc.cxx.o [ 37%] Linking CXX shared module release/_jemalloc.so [ 37%] Built target _jemalloc Scanning dependencies of target _table_pyx [ 41%] Compiling Cython CXX source for _table... [ 41%] Built target _table_pyx Scanning dependencies of target _table [ 45%] Building CXX object CMakeFiles/_table.dir/_table.cxx.o [ 50%] Linking CXX shared module release/_table.so [ 50%] Built target _table Scanning dependencies of target _config_pyx [ 54%] Compiling Cython CXX source for _config... [ 54%] Built target _config_pyx Scanning dependencies of target _config [ 58%] Building CXX object CMakeFiles/_config.dir/_config.cxx.o [ 62%] Linking CXX shared module release/_config.so [ 62%] Built target _config Scanning dependencies of target _memory_pyx [ 66%] Compiling Cython CXX source for _memory... [ 66%] Built target _memory_pyx Scanning dependencies of target _memory [ 70%] Building CXX object CMakeFiles/_memory.dir/_memory.cxx.o [ 75%] Linking CXX shared module release/_memory.so [ 75%] Built target _memory Scanning dependencies of target _array_pyx [ 79%] Compiling Cython CXX source for _array... [ 79%] Built target _array_pyx Scanning dependencies of target _array [ 83%] Building CXX object CMakeFiles/_array.dir/_array.cxx.o [ 87%] Linking CXX shared module release/_array.so [ 87%] Built target _array Scanning dependencies of target _io_pyx [ 91%] Compiling Cython CXX source for _io... [ 91%] Built target _io_pyx Scanning dependencies of target _io [ 95%] Building CXX object CMakeFiles/_io.dir/_io.cxx.o [100%] Linking CXX shared module release/_io.so [100%] Built target _io ('Moving built C-extension', 'release/_array.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_array.so') ('Moving built C-extension', 'release/_config.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_config.so') ('Moving built C-extension', 'release/_error.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_error.so') ('Moving built C-extension', 'release/_io.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_io.so') ('Moving built C-extension', 'release/_jemalloc.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_jemalloc.so') ('Moving built C-extension', 'release/_memory.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_memory.so') ('Moving built C-extension', 'release/_parquet.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_parquet.so') ('Moving built C-extension', 'release/_table.so', 'to build path', '/home/derdo/repos/arrow/python/pyarrow/_table.so') (pyarrow-dev) derdo@prompt:~/repos/arrow/python$ py.test pyarrow === test session starts === platform linux2 -- Python 2.7.13, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 rootdir: /home/derdo/repos/arrow/python, inifile: collected 210 items pyarrow/tests/test_array.py ...
[jira] [Created] (ARROW-960) [Python] Add source build guide for macOS + Homebrew
Wes McKinney created ARROW-960: -- Summary: [Python] Add source build guide for macOS + Homebrew Key: ARROW-960 URL: https://issues.apache.org/jira/browse/ARROW-960 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney This should include Homebrew-installed Python and installing pyarrow in a virtualenv. As an alternative to the current conda-based instructions -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-959) [Python] Add source build guide for CentOS 6 (with devtoolset) and CentOS 7
Wes McKinney created ARROW-959: -- Summary: [Python] Add source build guide for CentOS 6 (with devtoolset) and CentOS 7 Key: ARROW-959 URL: https://issues.apache.org/jira/browse/ARROW-959 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999595#comment-15999595 ] Wes McKinney commented on ARROW-955: See https://github.com/apache/arrow/pull/653. I will leave this JIRA open to add a build guide for Ubuntu 14.04 and/or 16.04. > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999593#comment-15999593 ] Wes McKinney commented on ARROW-955: I see the problem, when we added the TOOLCHAIN variables, this does not handle the library search paths for the Python build. Can you confirm that setting {code} export ARROW_HOME=$CONDA_PREFIX export PARQUET_HOME=$CONDA_PREFIX {code} fixes the problem? I will write a patch to fix the documentation. > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999575#comment-15999575 ] Wes McKinney edited comment on ARROW-955 at 5/6/17 9:38 PM: I was able to download the binary package 0.3.0 from conda, and check that read_row_group is available in the ParquetReader class: >>> print dir(pa.ParquetReader) ['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'column_name_idx', 'metadata', 'num_row_groups', 'open', 'read_all', 'read_column', 'read_row_group', 'set_num_threads'] >>> However, I do need the build from source to work.. So again I tried the instructions from: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst But this failed in the penultimate step: {code} python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc --inplace running build_ext creating build creating build/temp.linux-x86_64-2.7 ... ... -- Searching for Python libs in /home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config -- Looking for python2.7 -- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so -- Found the Parquet library: /usr/local/lib/libparquet.so -- Found the Parquet Arrow library: /usr/local/lib -- Found PkgConfig: /usr/bin/pkg-config (found version "0.26") -- Checking for module 'arrow' -- No package 'arrow' found CMake Error at cmake_modules/FindArrow.cmake:106 (message): Could not find the Arrow library. Looked for headers in , and for libs in Call Stack (most recent call first): CMakeLists.txt:234 (find_package) -- Configuring incomplete, errors occurred! See also "/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeOutput.log". See also "/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeError.log". error: command 'cmake' failed with exit status 1 -- The full output of the entire sequence of steps is below - (pyarrow-dev) [prompt]:~$ mkdir repos (pyarrow-dev) [prompt]:~$ date Sat May 6 12:56:26 PDT 2017 (pyarrow-dev) [prompt]:~$ cd repos (pyarrow-dev) [prompt]:~/repos$ git clone https://github.com/apache/arrow.git Cloning into 'arrow'... remote: Counting objects: 10468, done. remote: Compressing objects: 100% (21/21), done. remote: Total 10468 (delta 5), reused 1 (delta 1), pack-reused 10446 Receiving objects: 100% (10468/10468), 4.52 MiB | 0 bytes/s, done. Resolving deltas: 100% (6827/6827), done. Checking connectivity... done. (pyarrow-dev) [prompt]:~/repos$ git clone https://github.com/apache/parquet-cpp.git Cloning into 'parquet-cpp'... remote: Counting objects: 4022, done. remote: Compressing objects: 100% (11/11), done. remote: Total 4022 (delta 3), reused 0 (delta 0), pack-reused 4010 Receiving objects: 100% (4022/4022), 1.70 MiB | 0 bytes/s, done. Resolving deltas: 100% (2916/2916), done. Checking connectivity... done. (pyarrow-dev) [prompt]:~/repos$ ls -l total 8 drwxrwxr-x 13 derdo derdo 4096 May 6 12:56 arrow drwxrwxr-x 13 derdo derdo 4096 May 6 12:57 parquet-cpp (pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TYPE=release (pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX (pyarrow-dev) [prompt]:~/repos$ echo $CONDA_PREFIX /home/derdo/miniconda2/envs/pyarrow-dev (pyarrow-dev) [prompt]:~/repos$ export PARQUET_BUILD_TOOLCHAIN=$CONDA_PREFIX (pyarrow-dev) [prompt]:~/repos$ pwd /home/derdo/repos (pyarrow-dev) [prompt]:~/repos$ ls arrow parquet-cpp (pyarrow-dev) [prompt]:~/repos$ ls arrow appveyor.yml ci format java NOTICE.txt site c_glibcpp header js python CHANGELOG.md dev integration LICENSE.txt README.md (pyarrow-dev) [prompt]:~/repos$ ls arrow/cpp apidoc CMakeLists.txt docsrc build-support cmake_modules README.md thirdparty (pyarrow-dev) [prompt]:~/repos$ mkdir !$/build mkdir arrow/cpp/build (pyarrow-dev) [prompt]:~/repos$ pushd !$ pushd arrow/cpp/build ~/repos/arrow/cpp/build ~/repos (pyarrow-dev) [prompt]:~/repos/arrow/cpp/build$ cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DARROW_PYTHON=on -DARROW_BUILD_TESTS=OFF .. -- The C compiler identification is GNU 4.8.4 -- The CXX compiler identification is GNU 4.8.4 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999575#comment-15999575 ] Devang Shah commented on ARROW-955: --- I was able to download the binary package 0.3.0 from conda, and check that read_row_group is available in the ParquetReader class: >>> print dir(pa.ParquetReader) ['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'column_name_idx', 'metadata', 'num_row_groups', 'open', 'read_all', 'read_column', 'read_row_group', 'set_num_threads'] >>> However, I do need the build from source to work.. So again I tried the instructions from: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst But this failed in the penultimate step: python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc --inplace running build_ext creating build creating build/temp.linux-x86_64-2.7 ... ... -- Searching for Python libs in /home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config -- Looking for python2.7 -- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so -- Found the Parquet library: /usr/local/lib/libparquet.so -- Found the Parquet Arrow library: /usr/local/lib -- Found PkgConfig: /usr/bin/pkg-config (found version "0.26") -- Checking for module 'arrow' -- No package 'arrow' found CMake Error at cmake_modules/FindArrow.cmake:106 (message): Could not find the Arrow library. Looked for headers in , and for libs in Call Stack (most recent call first): CMakeLists.txt:234 (find_package) -- Configuring incomplete, errors occurred! See also "/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeOutput.log". See also "/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeError.log". error: command 'cmake' failed with exit status 1 -- The full output of the entire sequence of steps is below - (pyarrow-dev) [prompt]:~$ mkdir repos (pyarrow-dev) [prompt]:~$ date Sat May 6 12:56:26 PDT 2017 (pyarrow-dev) [prompt]:~$ cd repos (pyarrow-dev) [prompt]:~/repos$ git clone https://github.com/apache/arrow.git Cloning into 'arrow'... remote: Counting objects: 10468, done. remote: Compressing objects: 100% (21/21), done. remote: Total 10468 (delta 5), reused 1 (delta 1), pack-reused 10446 Receiving objects: 100% (10468/10468), 4.52 MiB | 0 bytes/s, done. Resolving deltas: 100% (6827/6827), done. Checking connectivity... done. (pyarrow-dev) [prompt]:~/repos$ git clone https://github.com/apache/parquet-cpp.git Cloning into 'parquet-cpp'... remote: Counting objects: 4022, done. remote: Compressing objects: 100% (11/11), done. remote: Total 4022 (delta 3), reused 0 (delta 0), pack-reused 4010 Receiving objects: 100% (4022/4022), 1.70 MiB | 0 bytes/s, done. Resolving deltas: 100% (2916/2916), done. Checking connectivity... done. (pyarrow-dev) [prompt]:~/repos$ ls -l total 8 drwxrwxr-x 13 derdo derdo 4096 May 6 12:56 arrow drwxrwxr-x 13 derdo derdo 4096 May 6 12:57 parquet-cpp (pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TYPE=release (pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX (pyarrow-dev) [prompt]:~/repos$ echo $CONDA_PREFIX /home/derdo/miniconda2/envs/pyarrow-dev (pyarrow-dev) [prompt]:~/repos$ export PARQUET_BUILD_TOOLCHAIN=$CONDA_PREFIX (pyarrow-dev) [prompt]:~/repos$ pwd /home/derdo/repos (pyarrow-dev) [prompt]:~/repos$ ls arrow parquet-cpp (pyarrow-dev) [prompt]:~/repos$ ls arrow appveyor.yml ci format java NOTICE.txt site c_glibcpp header js python CHANGELOG.md dev integration LICENSE.txt README.md (pyarrow-dev) [prompt]:~/repos$ ls arrow/cpp apidoc CMakeLists.txt docsrc build-support cmake_modules README.md thirdparty (pyarrow-dev) [prompt]:~/repos$ mkdir !$/build mkdir arrow/cpp/build (pyarrow-dev) [prompt]:~/repos$ pushd !$ pushd arrow/cpp/build ~/repos/arrow/cpp/build ~/repos (pyarrow-dev) [prompt]:~/repos/arrow/cpp/build$ cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DARROW_PYTHON=on -DARROW_BUILD_TESTS=OFF .. -- The C compiler identification is GNU 4.8.4 -- The CXX compiler identification is GNU 4.8.4 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done --
[jira] [Commented] (ARROW-813) [Python] setup.py sdist must also bundle dependent cmake modules
[ https://issues.apache.org/jira/browse/ARROW-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999568#comment-15999568 ] Uwe L. Korn commented on ARROW-813: --- The simplest version here could be to simply symlink the necessary module in {{python/cmake_module}}. Then the {{sdist}} command should include a copy (and not a symlink to nowhere) to the modules. > [Python] setup.py sdist must also bundle dependent cmake modules > > > Key: ARROW-813 > URL: https://issues.apache.org/jira/browse/ARROW-813 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.2.0 >Reporter: Wes McKinney > > The pyarrow tarball from sdist cannot be built currently because it depends > on files from the C++ directory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-909) libjemalloc.so.2: cannot open shared object file:
[ https://issues.apache.org/jira/browse/ARROW-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999551#comment-15999551 ] Uwe L. Korn commented on ARROW-909: --- [~abdulrahman004] Did you have any special compilation/linking options set? Normally {{pyarrow/_config.so}} should not link to {{libjemalloc.so.2}}. If you are able to run {{lddtree pyarrow/_config.so}}, it would really help me to understand where the linkage is coming from. I made PR https://github.com/apache/arrow/pull/651 to cover for the initial problem that when building jemalloc as an external project it should be statically linked as the shared library is not installed on {{make install}}. > libjemalloc.so.2: cannot open shared object file: > -- > > Key: ARROW-909 > URL: https://issues.apache.org/jira/browse/ARROW-909 > Project: Apache Arrow > Issue Type: Bug > Environment: linux centos >Reporter: Abdul Rahman > Labels: pyarrow > > >>> import pyarrow > Traceback (most recent call last): > File "", line 1, in > File > "/home/default/src/venv/lib/python2.7/site-packages/pyarrow-0.2.1.dev244+g14bec24-py2.7-linux-x86_64.egg/pyarrow/__init__.py", > line 28, in > import pyarrow._config > ImportError: libjemalloc.so.2: cannot open shared object file: No such file > or directory > $LD_LIBRARY_PATH has libarrow_jemalloc.a along with other libraries including > libarrow.so, libparquet.so, libparquet_arrow.so. Pyarrow was built using > with-jemalloc and parquet-cpp was cmake-d with > -DPARQUET_ARROW=ON > Also, noticed that arrow/python documentation has been cleaned up with the > installation instructions having the coda approach only .Is this the only > supported way going forward ? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0
[ https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-899: -- Assignee: Wes McKinney > [Docs] Add CHANGELOG for 0.3.0 > -- > > Key: ARROW-899 > URL: https://issues.apache.org/jira/browse/ARROW-899 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.4.0 > > > See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0
[ https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-899: --- Fix Version/s: (was: 0.3.0) 0.4.0 > [Docs] Add CHANGELOG for 0.3.0 > -- > > Key: ARROW-899 > URL: https://issues.apache.org/jira/browse/ARROW-899 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney > Fix For: 0.4.0 > > > See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0
[ https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-899: --- Summary: [Docs] Add CHANGELOG for 0.3.0 (was: [Docs] Add CHANGELOG) > [Docs] Add CHANGELOG for 0.3.0 > -- > > Key: ARROW-899 > URL: https://issues.apache.org/jira/browse/ARROW-899 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney > Fix For: 0.4.0 > > > See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-670) Arrow 0.3 release
[ https://issues.apache.org/jira/browse/ARROW-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-670. Resolution: Fixed > Arrow 0.3 release > - > > Key: ARROW-670 > URL: https://issues.apache.org/jira/browse/ARROW-670 > Project: Apache Arrow > Issue Type: Task >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.3.0 > > > As we near the next development milestone, please link issues that block the > release so we can keep track of what needs to be done -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-532) [Python] Expand pyarrow.parquet documentation for 0.3 release
[ https://issues.apache.org/jira/browse/ARROW-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-532: --- Fix Version/s: (was: 0.3.0) 0.4.0 > [Python] Expand pyarrow.parquet documentation for 0.3 release > - > > Key: ARROW-532 > URL: https://issues.apache.org/jira/browse/ARROW-532 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney > Fix For: 0.4.0 > > > Follow up to ARROW-531 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-446) [Python] Document NativeFile interfaces, HDFS client in Sphinx
[ https://issues.apache.org/jira/browse/ARROW-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-446: --- Fix Version/s: (was: 0.3.0) 0.4.0 > [Python] Document NativeFile interfaces, HDFS client in Sphinx > -- > > Key: ARROW-446 > URL: https://issues.apache.org/jira/browse/ARROW-446 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-944) Python: Compat broken for pandas==0.18.1
[ https://issues.apache.org/jira/browse/ARROW-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-944: --- Fix Version/s: 0.4.0 > Python: Compat broken for pandas==0.18.1 > > > Key: ARROW-944 > URL: https://issues.apache.org/jira/browse/ARROW-944 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Uwe L. Korn >Assignee: Jeff Reback > Fix For: 0.4.0 > > > The following failed for me with {{pandas==0.18.1}}: > {code} > In [1]: from pandas.core.dtypes import DatetimeTZDtype > --- > ImportError Traceback (most recent call last) > in () > > 1 from pandas.core.dtypes import DatetimeTZDtype > ImportError: No module named dtypes > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-956) remove pandas pre-0.20.0 compat
[ https://issues.apache.org/jira/browse/ARROW-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-956. Resolution: Fixed Issue resolved by pull request 649 [https://github.com/apache/arrow/pull/649] > remove pandas pre-0.20.0 compat > --- > > Key: ARROW-956 > URL: https://issues.apache.org/jira/browse/ARROW-956 > Project: Apache Arrow > Issue Type: Task > Components: Python >Reporter: Jeff Reback >Assignee: Jeff Reback >Priority: Trivial > > xref to ARROW-879 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-856) CmakeError by Unknown compiler.
[ https://issues.apache.org/jira/browse/ARROW-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999535#comment-15999535 ] Uwe L. Korn commented on ARROW-856: --- PR: https://github.com/apache/arrow/pull/650 > CmakeError by Unknown compiler. > > > Key: ARROW-856 > URL: https://issues.apache.org/jira/browse/ARROW-856 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: YJ >Assignee: Uwe L. Korn > > From :https://github.com/ray-project/ray/issues/468 > [root@SZV1000268092 python]# LANG=C gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper > Target: x86_64-redhat-linux > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man > --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla > --enable-bootstrap --enable-shared --enable-threads=posix > --enable-checking=release --with-system-zlib --enable-__cxa_atexit > --disable-libunwind-exceptions --enable-gnu-unique-object > --enable-linker-build-id --with-linker-hash-style=gnu > --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin > --enable-initfini-array --disable-libgcj > --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install > > --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install > --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 > --build=x86_64-redhat-linux > Thread model: posix > gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) > Result: > INFO GNU > CMake Error at cmake_modules/CompilerInfo.cmake:62 (message): > Unknown compiler. Version info is just the above. > Error > /usr/bin/c++-g -O3 -march=native -mtune=native -DCXX_SUPPORTS_ALTIVEC > -maltivec -o CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o -c > /home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp/src.cxx > c++: error: unrecognized command line option '-maltivec' > make[1]: *** [CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o] Error 1 > make[1]: Leaving directory > `/home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp' > make: *** [cmTryCompileExec1115247767/fast] Error 2 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-947) [Python] Improve execution time of manylinux1 build
[ https://issues.apache.org/jira/browse/ARROW-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999474#comment-15999474 ] Uwe L. Korn commented on ARROW-947: --- PR: https://github.com/apache/arrow/pull/648 (down to 14min) > [Python] Improve execution time of manylinux1 build > --- > > Key: ARROW-947 > URL: https://issues.apache.org/jira/browse/ARROW-947 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.3.0 >Reporter: Wes McKinney >Assignee: Uwe L. Korn > > Perhaps we could have the same testing benefits by limiting the matrix of > builds? Pulling the Docker image takes about 90 seconds, but the build itself > takes 25 minutes or more. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999469#comment-15999469 ] Wes McKinney edited comment on ARROW-955 at 5/6/17 3:51 PM: Yes, 0.3.0 includes that function. Since I don't have access to your environment, and you aren't pasting a reproducible set of steps or console output, it's very difficult for me to debug. I strongly recommend starting from a basic Miniconda installation (see https://conda.io/miniconda.html, do not use Ubuntu's system Python) and installing from conda-forge. If you can't get that working please report back and provide complete details (console output). was (Author: wesmckinn): Yes, 0.3.0 includes that function. Since I don't have access to your environment, and you aren't pasting a reproducible set of steps or console output, it's very difficult for me to debug. I strongly recommend starting from a basic Miniconda installation (not Ubuntu's system Python) and installing from conda-forge. If you can't get that working please report back and provide complete details (console output). > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999469#comment-15999469 ] Wes McKinney commented on ARROW-955: Yes, 0.3.0 includes that function. Since I don't have access to your environment, and you aren't pasting a reproducible set of steps or console output, it's very difficult for me to debug. I strongly recommend starting from a basic Miniconda installation (not Ubuntu's system Python) and installing from conda-forge. If you can't get that working please report back and provide complete details (console output). > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999466#comment-15999466 ] Devang Shah commented on ARROW-955: --- Does the 0.3.0 release export "read_row_group()" ? That's what I am interested in. Also, I tried conda instructions from: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst but I couldn't get this to even build (the setup.py last step wasn't working) - so I switched to what seemed to be simpler instructions at: https://arrow.apache.org/docs/python/install.html which I was able to build as stated above, but then failed to run. This is the first time I am using conda, and so very unfamiliar with it. I'd really appreciate any help to get me to the next step in running pyarrow.parquet on my current non-conda build from source. It may be a simple matter of configuration which is eluding me... Thanks a lot for your response! > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-957) [Doc] Add HDFS and Windows documents to doxygen output
Uwe L. Korn created ARROW-957: - Summary: [Doc] Add HDFS and Windows documents to doxygen output Key: ARROW-957 URL: https://issues.apache.org/jira/browse/ARROW-957 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Uwe L. Korn Assignee: Uwe L. Korn Currently these documents are not rendered on the website. I would move them to the {{apidoc/}} folder and link to them in the main doxygen page. Probably this is the point where we also would move {{apidoc/}} back to {{docs/}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-939) Fix division by zero for zero-dimensional Tensors
[ https://issues.apache.org/jira/browse/ARROW-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-939. Resolution: Fixed Issue resolved by pull request 634 [https://github.com/apache/arrow/pull/634] > Fix division by zero for zero-dimensional Tensors > - > > Key: ARROW-939 > URL: https://issues.apache.org/jira/browse/ARROW-939 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.3.0 >Reporter: Philipp Moritz >Priority: Minor > > see https://github.com/ray-project/ray/issues/500 > The division "remaining /= dimsize" in cpp/src/arrow/tensor.cc:45 raises a > division by zero exception if dimsize = 0. > This was found by https://github.com/stephanie-wang. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-955: --- Summary: [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda (was: ImportError: No module named _config) > [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda > -- > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-956) remove pandas pre-0.20.0 compat
Jeff Reback created ARROW-956: - Summary: remove pandas pre-0.20.0 compat Key: ARROW-956 URL: https://issues.apache.org/jira/browse/ARROW-956 Project: Apache Arrow Issue Type: Task Components: Python Reporter: Jeff Reback Assignee: Jeff Reback Priority: Trivial xref to ARROW-879 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ARROW-955) ImportError: No module named _config
[ https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-955: --- Issue Type: Improvement (was: Bug) > ImportError: No module named _config > > > Key: ARROW-955 > URL: https://issues.apache.org/jira/browse/ARROW-955 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu > Python 2.7.6 >Reporter: Devang Shah > > I built pyarrow, arrow, and parquet-cpp from source - so that I could use the > new read_row_group() interface and in general, have access to the latest > versions. I ran into many issues during the build but was ultimately > successful (notes below). However, I am not able to import pyarrow.parquet > due to the following issue: > >>import pyarrow.parquet > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/init.py", line 28, in > import pyarrow._config > ImportError: No module named _config > This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, > where also I posted this...but I think this forum is more direct and > appropriate - so re-posting here. > I used instructions at https://arrow.apache.org/docs/python/install.html to > build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations > (I view them as possibly bugs in the instructions): > arrow/cpp build: > export ARROW_HOME=$HOME/local > I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake > command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) > parquet-cpp build: > export ARROW_HOME=$HOME/local > cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static > -DPARQUET_ARROW=ON . > make > sudo make install > this installs parquet libs in the std systems > location (/usr/local/lib) so that the pyarrow build (see below) can find the > parquet libs > pyarrow build: > export ARROW_HOME=$HOME/local (not a deviation; just repeating here) > export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest > sudo python setup.py build_ext --with-parquet --with-jemalloc > --build-type=release install > sudo python setup.py install > (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) > These are the steps and modifications to the instructions needed for me to > build the pyarrow.parquet package. However, when I now try to import the > package I get the error specified above. > Maybe I did something wrong in my steps which I kind of put together by > searching for these issues...but really can't tell what. It took me almost a > whole day to get to the point where I can build pyarrow and parquet, and now > I can't use what I built. > Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-955) ImportError: No module named _config
Devang Shah created ARROW-955: - Summary: ImportError: No module named _config Key: ARROW-955 URL: https://issues.apache.org/jira/browse/ARROW-955 Project: Apache Arrow Issue Type: Bug Components: Python Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu Python 2.7.6 Reporter: Devang Shah Priority: Blocker I built pyarrow, arrow, and parquet-cpp from source - so that I could use the new read_row_group() interface and in general, have access to the latest versions. I ran into many issues during the build but was ultimately successful (notes below). However, I am not able to import pyarrow.parquet due to the following issue: >>import pyarrow.parquet Traceback (most recent call last): File "", line 1, in File "pyarrow/init.py", line 28, in import pyarrow._config ImportError: No module named _config This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, where also I posted this...but I think this forum is more direct and appropriate - so re-posting here. I used instructions at https://arrow.apache.org/docs/python/install.html to build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations (I view them as possibly bugs in the instructions): arrow/cpp build: export ARROW_HOME=$HOME/local I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME) parquet-cpp build: export ARROW_HOME=$HOME/local cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static -DPARQUET_ARROW=ON . make sudo make install > this installs parquet libs in the std systems location (/usr/local/lib) so that the pyarrow build (see below) can find the parquet libs pyarrow build: export ARROW_HOME=$HOME/local (not a deviation; just repeating here) export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest sudo python setup.py build_ext --with-parquet --with-jemalloc --build-type=release install sudo python setup.py install (sudo is needed to install in /usr/local/lib/python2.7/dist-packages ) These are the steps and modifications to the instructions needed for me to build the pyarrow.parquet package. However, when I now try to import the package I get the error specified above. Maybe I did something wrong in my steps which I kind of put together by searching for these issues...but really can't tell what. It took me almost a whole day to get to the point where I can build pyarrow and parquet, and now I can't use what I built. Any comments, help appreciated! Thanks in advance. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-929) Move KEYS file to SVN, remove from git
[ https://issues.apache.org/jira/browse/ARROW-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-929. --- Resolution: Fixed Issue resolved by pull request 646 [https://github.com/apache/arrow/pull/646] > Move KEYS file to SVN, remove from git > -- > > Key: ARROW-929 > URL: https://issues.apache.org/jira/browse/ARROW-929 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Wes McKinney >Assignee: Wes McKinney > -- This message was sent by Atlassian JIRA (v6.3.15#6346)